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i.    INTRODUCTION 

Consider  the  foliowtne;  optimization  problem  with  nonlinear 
equaiitv  constraints: 

min        t (x)  subject    to  c(x)    =    0  (i.l) 

where  f:  1^  *  IR  and  c:  ^R  ^  ->•  K  '^  are  twice  different iable  functions 
and  "1  <  n.  We  will  further  assune  that  the  Hessian  matrices  V -f  and 
{V~c^},  i=l,...,m,  are  Lipschitz  continuous  matrix  functions  of  x. 
Although  this  problem  arises  frequently  in  many  applications,  a  more 
common  version  involves  inequality  constraints.  We  confine  our 
attention  to  (1.1),  partJ.y  because  it  is  easier  to  address  and  partlv 
because  a  ^ood  understanding  of  algorithms  for  solving  f  1 .  1 ")  is 
essential  for  the  design  of  methods  for  the  more  general  problem.  ^Tote 
that  we  are  concerned  only  with  finding  a  local  minimum  of  (1.1). 
Finding    the    global   minimum   is   a   much   more    difficult    task. 

In  Section  2  we  discuss  some  of  the  Newton-type  methods  which  have 
been  proposed  to  solve  (1.1).  In  Section  3  we  describe  a  new 
Broyden-type  irethod,  which  was  suggested  bv  the  work  of  Hoodman  (1982). 
This  method  updates  an  n  x  (n-m)  matrix  which  approximates  a  "one-sided 
proiected  Hessian"  of  a  Lagrangian  function  for  (1.1).  "Hie  method  is 
easily  shown  to  have  a  local  one-step  o-superlLnear  convergence 
propertv.  We  also  give  a  new  proof  of  the  ^oggs-Tolle-'feng  necessary 
and  sufficient  condition  for  O-superlinear  convergence  of  a  class  of 
quasl-Newton  methods  for  solving  (i.l).  In  Section  4  we  describe  an 
algorithn  which  updates  an  approximation  to  a  "two-sided  projected 
Hessian,"   a    symmetric    matrix  of    order  n-m  which    can   be   expected      to      be 
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posltlve  definite  near  a  solution  to  (1.1).  This  idea  was  sug<^ested  hv 
Murray  and  Wright  (1*^78)  and  has  also  been  discussed  bv  several  other 
authors.  We  present  several  variants  of  this  algorithm  and  prove  that 
under  certain  conditions  they  all  have  a  local  two-step  n-suneriinear 
convergence  property.  Finally,  in  Section  S  we  present  some  numerical 
results  which  indicate  that  these  methods  may  be  verv  useful  in 
practice. 

'''hroughout      the     paper     we     will     use      II  xll    to   denote    the  '^uclidean 

9      1/9 

vector  norm  (I  xt)  '~  and  H  Bll  for  the  corresponding  induced  matrix 
norm,    max    {II  Bxll  |ll  xll    =    1}. 


C 


y 
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2.    NWTON-TYPE   METHOOS 


We  begin  by  estabilshin?  some  notation  and  sunmarizin^  the  well 
known  optinaiitv  conditions  for  (1.1);  proofs  of  these  mav  be  found  in 
Fletcher      (1981).        Let      ^(x)    =    Vf(x),       the      t^radient      of    the   obiectlve 

T 

function,  let  c(x)    =    [c  j^  (x)  ,  . . .  ,c^(x)  1 "  ,  and  let 

A(x)  =  [Vc,  (x)  ,  .  .  .  ,Vc  (x)],  the  n  x  m  matrix  of  constraint  {gradients. 
Now  let  X  be  a  aiven  point  with  A(x)  having  full  column  rank,  i.e., 
rank  tn,  and  let  t  =  n-n.  We  will  make  frequent  use  of  a  matrix  7.(\)  , 
which  is  defined  to  be  a  full  rank  matrix  with  dimeiision  n  x  t,  whose 
columns  span  N  [ A(x)  j  ,  the  null  space  of  A(x)  .  Such  a  matrix  is  bv 
no  n^ans  unique,  and  the  precise  definition  will  varv  with  the  context. 
One    verv    useful   wav   to  obtain  Z   is"  by  forming    the   OR  factorization: 


A(x)    =   [Y(x)        Z(x)l 


R(x) 
0 


(2.1) 


where  Y(x)  and  Z(x)  have  orthonormai  columns  and  R(x)  is  an  m  x  m  upper 
triangular  nntrix.  The  columns  of  Y(x)  form  an  orthonormai  basis  for 
R  [A(x)j,  the  range  space  of  A(x).  Note  that  Z(x)  is  not  uniquely 
defined   by    (2.1). 

The  first-order  necessary  condition  for  x  to  be  a  solution  of 
(l.l)  is  the  existence  of  a  vector  of  Lagrange  multipliers,  X,  such 
that 


q(x,X) 
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S(x)    -    A(x)X 

c(x) 


(2.2-) 


All  vectors  are  column  vectors,  but  for  convenience  we  '.^ite  (x,A  )  for 
[?^j  .  Note  that  the  first  part  of  q(x,X  )  is  the  gradient,  with  respect 
to   X,    of    the  La^rangian   function 


L(x,X  )    =  f (x)    -     I      X^c^(x)    . 
1=1 


Let  W(x,X  ■)  be    the  Hessian,    with   respect    tox,    of   ^Ax,\^,    i.e.. 


W(x,X)    =  V-f(x)    -     I      X^V-c^(x) 
1  =  1 


The  second  order  necessary  condition  for  x  to  be  a  solution  of  (1.1)  is 
that  the  projected  Hessian  Z(x)  W(x,X  )Z(x)  be  positive  semi -definite. 
A  sufficient  condition  for  x  to  be  a  solution  of  (1.1)  is  that  (2.2) 
hold  with  Z(x)'^W(x,X  )Z(x)  positive  definite.  [The  exact  form  of  Z(x) 
is  ImTTHterlai  to  these  conditions,  provided  that 

R  [Z(x))  =  W[A(x)'^)).  We  will  assume  that  there  Is  Indeed  a 
solution  (x^,X^)  where  these  conditions  hold.  Since  this  assumption  is 
made    throughout   the    oaper,    we   state    it    formailv: 


As  sump  tlon  j_.  Assume  that  there  is  a  solution  x^  to  (1.1),  ^•rLth 
A^  =  A(x^)  bavins  full  column  rank,  X^  satlsfvlng  (2.2),  and 
z'^Wj^Zjt  =    Z(xjt)'^W(x*,X^)Z(x*)   positive    definite. 

'^ny  numerical  methods  have  been  suggested  for  approximating  a 
solution   to    (1.1).      In    this    paper  we   are    concerned   onlv  with    algorithms 
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whLch'are  based  on  some  variation  of  Newton's  method  for  soivin?  (2.2) 
or  a  related  svs tern  of  equations.  In  particular,  we  are  concerned  with 
methods  which  make  use  of  linear  approximations  to  the  constraints  at 
every  iterate  x^.  Other  classes  of  methods  include  the  "augmented 
Lagrangian"  or  "multiplier"  methods;  for  a  description  of  these,  see 
Bertsekas    (1982). 

The  usual  derivation  of  a  Newton-type  method  to  solve  (1.1)  is  to 
apply  Newton's  method  for  solving  nonlinear  systems  of  equations  to 
(2.2),  a  system  of  n  +  m  equations  in  n  +  m  unkno^^ms.  The  Jacobian  of 
q(x,X),    t-rLth   respect    to  (x,X),    is 


q'(x,X) 


W(x,X)  -A(x) 


A(x) 


(".T) 


Let    C'^],.^!,)   be    the  kth   iterate    of    the   Newton   process,   which    is      defined 
by 


Ax, 


q'(x^,X^)[^J    =    -q(x,,,X^) 


"k+1 


l^k+ij 


-k- 

(Ax 

+ 

h. 

AX 
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WCx^.X^)  -A(x^) 


A(x,)^  0 


k+l 


R(x^) 


:(x^) 


(2.4) 


^k+1   =   '^k   +  P    • 


An   alternative   derivation  of    (2.4)   is    to  view     each      step     of      the 
orocess   as   solving   a  quadratic   program  or  "inodei   prohiem": 


Tiin  _  p'-W(xj^,Xj^)p  +  g(xj^)"D 


(2.5) 


T 
subject    to  A<x^)    p   =    -c(x^)    . 


This  is  easily  seen  to  be  equivalent  to  (2.4)  provided  that  (2.5)  has  a 
unique  rnininutn.  The  Lagrange  multipliers  defining  the  solution  of  the 
quadratic  program  are  given  hv  X^_^^  in  (2.4).  One  of  the  advantages  of 
this  formulation  is  that  it  is  easiiv  generalized  to  handle  inequality- 
constraints.  The  earliest  reference  for  this  seems  to  he  ^lilson 
(1963). 

As  is  well  known,  the  system  (2.4)  can  be  solved  by  several 
techniques.  One  approach  is  block  Oauss  elimination,  a  method  more 
commonly  described  as  using  the  Schur  complement.  "^his  reauires  the 
use  of  W"-'-  and  is  not  recommended  since  ^^  may  be  singular  or 
ill-conditioned  even  if  (2.^^)  is  nonsingular  and  well-conditioned  [see 
Oill  and  Murrav  (i974b,  d.  45)j.  A  second  approach  is  to  rewrite  (2.4) 
using    (2.1).      Let  I   denote    Che   identity  matrix  whose   dimension  will     be 
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ciear     from      the    context,    let  Y,Z  and   R   be   given  by    (2.1),    with'  x    =   x,  , 
assumir^   -"^^^k^   ^^^   r'uli   rank,    and    let 


Q    = 


[Y      Z  ]  0 


an   orthogonal   natrix  or    order  n   +  m.      The      system      (2.4')      can      then      be 
^>rritten 


-A 
0 


00^ 


k+1 


-n' 


"•k+l   ~  -^k 


Xu    +   p 


1. e.  , 


y'^wy 


z^m 


y'^wz 


z'^wz 


-R 

0 

0 


py 
Pz 

^k+1 


(2.fS) 


Ypv   +   Zp, 


=    ^v    +  P 


"k+1   -    -"k 


Here  it  is  understood  that  all  vectors  and  matrices  on  the  left  and 
right  hand  sides  are  to  he  evaluated  at  (x,  ,X-.  ).  This  system  now 
reduces    to   solving 
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^'''py   =  -c  (2.7a) 

(Z'^WZ)P2   =    -z'^q  -  z'^WYpy  (2.7b) 

p    =  YpY  +   Zp;,  (2.7c) 

^k+1   =    ^k   "^  P  ^-•^^~' 

'^  ^k+1    =  ^'^3  +  '^'^^^P  ^2.7e) 

It    is   clear    from   the  equivalence   ot    (2.4)   and   (2.7)    that     q'(x|^,X)^)      is 

T 
nonslngjiiar    Iff  R   and   Z    WZ  are  nonsingular.      Note   that   R    Is   nonslngular 

T 
Iff   A   has    full  column    rank.      Note  also    that    If    Z   WZ   is    nonsingular      but 

indefinite,      the      solution      of      (2.4)      identifies   a   stationary  point    of 

(2.5)   which    is   not   a  minimum,    and   which  will  not   be   useful  for     solving 

(1.1).      However   it    is  clear    from  Assumption    1   that  A(x,  )   will  have    full 

rank  and   ^(x.  )    W(xi^,A,  )Z(x.  )   will   be   positive    definite      if       ^^i<^'^i<:^      i=^ 

sufficiently    close  to    (x^,X^).      Furthermore    the   iterates    Hx.  ,X,  )}  will 

converge   quadratically  to      (x^y\^)     if      ^'^o'^o^      ^^      sufficiently      near 

(x*,^  *) . 

This      formulation      of   Newton's  method    is   essentially  the  algorithm 

given   by   Murray      and     Wright      (1978);      see     also     Wright      (1976).        The 

advantage      of      this   formulation   is    that   an   Indefinite   projected  Hessian 

T 
Z   WZ  can  be   detected,   and    hence  appropriate      action      can     be      taken     to 

ensure      convergence      to    a    minimum  of    (1.1),    not    iust   any   zero   of    (2.2). 

There   are    two    important    features    of   Murray  and  Wright's   algorithm  which 

are      not     included   above.      The  first    concerns   the  Lagrange  multipliers. 

Instead     of      (2.7e),      consider     using     the      "least      squares     multiplier 

estimate,"   taking  ^uii    as    the  solution  of 
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min  II  A(x^+pX    -  RCx^^pil^  (2.8) 

X 


i.e.  ,    solving 


RCx        )  X  =  Y(x        f^(x        )  (2.9) 

k+i        k+1  k+1  k+1 


It  is  intuitively  clear  that  using  the  'f  irs  t-order"  estimate  (2.8), 
obtained  from  the  new  information  at  x,  ^,  ,  should  he  preferable  to 
using  the  "second-order"  estitnate  (2.7e),  which  is  just  the  Newton 
prediction  of  values  at  x,  ^,  based  on  information  at  x.  .  Wright  (1976, 
p.  144)  shows  that  using  (2,8)  instead  of  (2.7e)  cannot  impede  the 
quadratic  convergence  of  the  method.  [ 8ee  also  Fletcher  (1974)  and 
Hill  and  Murray  (1979).j  Furthermdre,  Tapia  (1977)  shows  that  using 
(2.8)  gives  a  method  which  is  0-quadratically  convergent  in  x,  in  the 
sense  of  Ortega  and  Rhelnboldt  (1970),  while  ' 'the  iteration  (2.7)  is 
0-quadraticaliy  convergent  in  (x,X  )  but  not  necessarily  in  x.  However, 
when  (2.8)  is  used  the  method  is  no  longer  a  Newton  iteration  for 
solving    (2.2).      We    shall    return    to    this    point    later. 

We  note  in  passing  that  Tapia  (1977)  actualiv  refers  to  two 
Lagrange  trultiplier  computations  during  the  course  of  an  iteration.  He 
names  these  H  and  U,  and  he  emphasizes  the  importance  of  using  the 
solution  ^u+i  of  (2.4)  to  define  H  although  the  least  squares  estimate 
may  be  used  for  U.  However,  this  distinction  is  necessary  only  when 
the  block  Causs  formula  is  used  to  solve  (2.4),  as  then  A  is  obtained 
first  and  x  is  obtained  from  X.  When  (2.7)  is  used,  U  is  no  longer 
required.         We      also      note      here      that      Tapia     makes      the        interesting 
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observatlon  that  the  Newton  method  is  equivalent  to  a  "diaf^onaJ.i^ed" 
augmented   Lagrangian  Fiethod;    see  his    paper   for   details. 

The  second  difference  between  (2.7)  and  ^lurray  and  ^'right's 
algorithn  is  that  the  latter  uses  a  different  right  hand  side  for 
(2.7a).  Provided        that        the      equation     used      converges      to      (2.7a) 

sufficiently  rapidly  as  x,  converges  to  x^,  the  quadratic  convergence 
is  retained.  The  arguments  for  using  the  modified  method  are  given  by 
Murray  ( 1969a, b),  Biggs  (1972),  l-Jright  (1976),  ^■furrav  and  Wright  (1978) 
and  Bartholomew-Biggs  (1992).  However  we  will  not  consider  tills  anv 
further  here.     .      ,.-■.-,,.,..•_,,.    ..,     ■,    „■ 

Let  us  now  turn  to  a  second  derivation  of  a  Newton-tvpe  method 
proposed  recently.-  by_  Ooodm&n  (1982,)..  The  central  idea  is  to  re\>n:ite 
the   optimaiity   condition   (2.2)  as,..     ■>.,.- 


r(x)  = 


Z(x)^g(x). 
c(x) 


=  -0 


(2.10) 


where  Z(x),  as  before,  is  a  matrix  with  full  column  rank  satisfying 
R  (Z(x)j  =  H  (,A(x)'^).  Note  that  (2.10)  is  a  svstem  of  n  equations  in 
n  unknowns.  There  are  many  possible  definitions  of  Z(x).  Let 
Z  =  Z(x^^)  be  the  matrix  provided  by  a  particular  implementation  of  the 
OR  factorization  (2.1),  for  a  given  point  x^ .  Goodman  shows  that  Z  can 
be  extended  to  a  smooth  matrix  function  Z(x)  defined  in  a  neighborhood 
of  x^^,  with  R  (_Z(x))  =  W  (,A(x)"J,  and  with  the  propertv  that  the 
Jacobian  of    r(x),    evaluated   at  x,  ,    is 


r'(x,^) 


(2.11) 


where  X|^  ^  ^  ^^k''  ^^  ^^^  least  squares  muitlpiler  estimate,  I.e.,  the 
solution  ot  (2.9)  with  k+1  replaced  by  k.  The  Newton  step  tor  soJ.vins» 
(2.10)    with    this   definition   of    Z(x)    is    thus    .^iven  by 


Z(x,^)'^W(x^,X^)"]    p    =    - 


AU^f 


Z(x^)^g(x^) 


c(x^) 


^k+1    ~    -^k 


^v   +  P    . 


or,    usin»    (2.1), 


r  zJw 


[y       z] 


p  =  - 


z^. 


I.e. 


yj^m 


z'^wz 


Py 
Pz 


(2.12) 


P    =   '^Py    +    -Pz 
^k+i    =    ^k    +   P        • 


This    system    is      identical      to      (2.7)      except      that      the      least      squares 
multiplier       estimate      is      used      instead      of      (2.7e).        Note     that      the 
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nonsinguiariCy  of  (2.3),  (2.f^),  (2.11)  and  (2.12)  are  ail  equivalent 
conditions.  "^Totice,  however,  that  solving  a  sequence  of  systems  (2.12) 
is  not  Newton's  method  for  any  particular  function  since  the  fiinction 
Z(x)  yielding  the  Jacobian  (2.11)  changes  for  each  point  x^.  Goodman 
points  out  that  such  a  method  amounts  to  applying  Newton's  method  to  a 
variable  function,  i.e.,  one  that  changes  at  every  iteration.  Thus  we 
refer  to  the  method  as  a  Newton-type  method.  Goodman  shows,  assuming 
nonsinguiarity,  that  the  method  is  quadraticaiiy  convergent.  Thus  the 
result,  mentioned  earlier,  that  using  the  J.east  squares  multiplier 
estimate  yields  a  quadraticaiiy  convergent  algorithm,  is  obtained  again 
from  a  nS'/  viewpoint.  Goodman's  results  are  very  interesting,  because 
they  shoJ  that  although  the  use  of  the  least  squares  estimate  does  not 
give  a  Newton  method  for  solving  (2.2),  it  does  result  in  a  Newton-type 
method   for   solving    (2.10). 

Notice  that  given  any  particular  choice  of  Z(Xj^),  all  other 
possible  choices  are  given  by  Z(x.  )V  for  some  nonsingular  t  x  t  matrix 
V,  and  hence  the  solution  of  (2.12)  is  mathemat icailv  independent  of 
the  choice  of  Z(x^).  (Gill  and  Murray  (197Aa,b)  have  emphasized, 
however,  that  the  choice  of  a  matrix  with  orthonormal  columns,  as  given 
by  (2.1),  is  important  for  numerical  stability.)  Nonetheless,  the  form 
of  the  Jacobian  (2.11)  is  applicable  onlv  to  the  particular  matrix 
function  Z(x)  which  Goodman  constructs  in  the  neighborhood  of  a  given 
X.  .  If  Z(x)  is  simply  defined  by  a  particular  implementation  of  (2.1), 
then  provided  this  is  continuously  dif ferentiable  at  a  point  x,  Goodman 
shows    that  the    Jacobian    of    (2.10)    has   the    form 
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z'^W  +  T(z'^ 


S) 


(2.13) 


where  T  is  a  tensor  ^iven  bv  T  =  Z  Z,  with  Z  represent  inf»  the 
derivative  of  Z.  It  would  he  necessay  to  use  (2.13)  if  one  wanted  to 
take  a  Newton  step  for  (2.10).  However,  (2.13)  reduces  to  (2.11)  for 
X  =  x^,  and  since  Z  ?  -»•  0  as  x,  ">•  x^,  ignoring  the  term  T(Z  g)  does  not 
impede  quadratic  convergence  of  the  method.  Such  a  procedure  is 
similar  to  the  Hauss-Mewton  method  for  nonlinear  least  squares  when  the 
residuals  are  zero  at  the  solution.  This  method  is  not  Newton's  method 
for  rraLnimizlng  the  sum  of  squares,  but  the  neglected  term  of  the 
Hessian  converses  to  zero  as  the  solution  is  approached,  and  so 
quadratic   convergence    is    retained. 

This   raises   an   interesting  apparent   paradox.      Consider     minimizing 
the   sun    of    squares 


1  T        1        T      9        1  'y 

(x)    =  1  II  r(x)ll  "   =  Y  II  7.Vgll  -  +  1  II  dl  -    . 


(2.14) 


Suppose  we  restrict  the  possible  choices  of  Z(x)  to  have  orthonormai 
columns.  The  function  <ti  (x)  is  then  clearly  identical  for  ail  such 
possible  choices  of  Z(x),  and  Newton's  method  for  minimizing  <!?(:<) 
[i.e.,  finding  a  zero  of  V(})  (x)j  is  independent  of  the  particular  choice 
of      Z(x).         Tt      would    appear    to    follow    that    the  Oauss-Newton   method   for 

minimizing  i>  (x)    is    independent    of    Z(x)    for    the   same      reason.         But      the 

1  7 

Gauss-Newton   step    to   minim.ize  _  II  r(x)ll  "    is,    by    definition.    Identical   to 

the  Newton  step   to    find    a   zero  of    r(x).      This   apparently   shows   that    the 
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Newton      step      to    find    a   ^ero  of    (2.10)    Is    Independent    of    the    partLcuiar 
choice    of    the    matrix    function   Z(x),    provided    that    Z(x)       Is     smooth      and 
has        orthonoriTiai        coiumns.        This      conclusion      Is      Inconsistent      with 
Goodman's    results. 

The  solution    to    the    paradox    Is   that    aithouq;h    the      Newton      step      to 

1  9 

minimize     (j)  (x)    =  —  II  r(x)ll  ~      depends    onlv    on  (JiC^)*    the  Gauss-Mewton   step 

Is    defined   not    b/   <^  (x)    but    hy    r(x).      Thus    the   quadratic   conversence      of 

T 
the      method     which      l.^ores   T(Z   g)    In    (2.13)    Is   not    onJ  v   slmlJar    to    the 

behavior      of      a      Gauss-Newton      method — the      method      Is        a        particular 

Gauss-Newton   irethod  for  minimizing    (2.14).      This    again    demonstrates    the 

quadratic  convergence   of    the   method. 

We  note    that    even  If    Z(x)    is  not   restricted      to     have      orthonormal 

columns,        but        only        to     satisfy        R(,Z(x)j    =    A/(,A(x)    j,      with     Z(x) 

continuously    dlf f erentiable,    the  Jacobian  of    (2.10)      is      still     of      the 

form      (2.13),      with    T      containing     an     extra      term.        To     see   this    let 

Z(x)    =  Z(x)V(x),    where  Z(x)   has    orthonormal  columns,    and    Z(x),V(x)      are 

caitlnuously      dlf f erentiable.        nif ferentlat Ing    r(x)    we    then   obtain   the 

*  T    T  * 

same    formula   as    before,    with    an      extra       term     V    Z    g,      where      V      Is      the 
derivative   of  V. 

Goodman's  results  have  some  features  In  common  with  earlier  work 
of  Tapla  (1974).  Tapla  also  defined  a  Newton-type  method  for  solvlnsj  a 
system  of  n  equations  In  n  unknowns,  I.e.  ,  a  statement  of  the 
first-order  necessary  conditions  which  does  not  Involve  Independent 
Lagrange    parameters,      Tapla's    formulation  was 

s(x)    =    g(x)    -   A(x)X(x)    =   0 
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where  a (x)  is  a  particular  Lagrange  nuitipiier  estimate  chosen  so  that 
this  system  is  indeed  equivalent  to  (2,2).  The  method  derived  rrom 
this  formulation,  although  stated  using  the  hlock  Causs  approach  to 
solving  svstems  of  the  form  (2.4),  could  aJ.so  be  t-rritten  as  (2.7)  with 
(2.7e)  replaced  by  ■^i^+i  =  ^(xi.4.1),  using  the  special  muJ.tipJ.ier 
estimate.  Tapia  shows  that  his  method  is  n-quadratically  convergent  in 
X,  and  again  it  is  necessary  to  consider  a  smooth  function  which 
changes  at  each  iteration.  Tapia  (1*^77)  also  mentions  some  other 
railtiplier   estimates    which    retain   0-quadratic   convergence. 

In  summary,  one  basic  method  has  been  discussed,  with  variations 
differing  onlv  in  the  choice  of  the  multiplier  estimate  used  to  form  W. 
The  best  choice  would  clearly  seem  to  be  the  least  squares  estimate. 
This  is  the  simplest  to  compute;  it  fully  uses  the  latest  x 
information;  it  approximates  the  first-order  conditions  in  the  least 
squares  sense;  and  it  results  in  a  method  which  is  0— quadrat ically 
convergent    in   x. 

In  this  paper  we  are  concerned  only  with  local  convergence 
results.  In  particular,  we  have  not  discussed  the  important  question 
of  the  choice  of  merit  function  to  ensure  progress  towards  the 
solution.  A  related  matter  is  the  line  search  needed  if  the  full 
Newton  step  given  by  (2.7)  is  not  acceptable.  These  questions  are 
addressed  by  many    recent    papers    in    the    literature. 
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3.    UPDATING   AN   APPROXIMATION   TO  W   OR   z'^W 

Let  us  now  consider  methods  which  solve  (l.i)  without  requi.rin«? 
the  formation  ot  the  Hessian  matrices  V-f,  {V-c.},  but  instead  update 
some  suitable  approximation  matrix.  Such  methods  are  variously  known 
as  quasi-Newton,  variable  metric  or  secant  methods;  we  wiJ.l  simpiv  call 
them  updating  methods.  See  Dennis  and  More  (1977)  for  a  survev  of 
updating  methods  for  solving  nonlinear  equations  (particularly 
Broyden's  method)  and  unconstrained  optim.iza  tion  (particularly  the  PSB, 
DFP  and  TiFGS  methods).  —These  methods  have  attractive  superlinear 
convergence  nroperties..  We  recall  from  Dennis  and  More  (1^77)  and 
Ortega  and  Rheinboidt- ( 1970)  that  a  sequence  {x,  }  which  converges  to  x^ 
is   said    to  be  one-step  0-superlinearly  convergent    if 

"  ^k+1    ~  ■'^*"    ^    "'k"  ^   ~  '^*"     '    ^^'^^  "^k  "*"    ^   as    k  >  °°    . 

.     .'        '     3.!"  i 

One  possible  approach  to  solve  (i.l)  is  to  update  an  approximation 
to  the  Jacobian  (2.3).  This  was  discussed  bv  Garcia  and  Mangasarian 
(1974),  but  it  clearly  has  its  drawbacks  as  it  ignores  the  structure  in 
(2.3).  Furthermore,  although  (2.3)  can  be  considered  a  symmetric 
matrix  by  changing  the  (arbitrary)  sign  of  the  Lagrange  multipliers,  it 
is  aJwavs  indefinite  and  hence  the  verv  successful  DFP  and  BFGS  methods 
are  Inapplicable.  A  natural  idea  is  to  consider  approximating  the 
matrix  W  alone,  since  the  other  blocks  of  (2.3)  are  known  from 
first-order  information.  However,  since  W(x^,X^)  is  not  generally 
positive  definite,  the  DFP  and  1?FGS  updates  are  not  directly 
applicable.         Therefore      Han      (1976)      and     Tapia      (1977)      independently 


-IS- 
suggested      considering        the        Hestenes-Poweii        augmented        Lagrangl.-^n 
f unct  ton: 


L^(x,X  ,p)    =  L(x,X)    +£.  Ilc(x)ll  -    , 


where  p  is  a  parameter  to  he  chosen.   It  is  easily  shown  that 


'^xx^A^''*'^*'P^  =W(x*,X^)  +pA(x*)A(x*)'^ 


is  positive  definite  if  p  is  large  enough,  given  Assumption  1.  Thus 
assuning  p  is  large  enough,  iteration  (2.4)  may  be  used,  replacing  W  bv 
an  approximating  matrix  B,  ,  obtained  using  the  DFP  or  BFGS  update 
formula  with 

^k   "  '^k+l   "  ^k    '      ' 

^k    ^^x^A^'^k+l'H+l'P^    -^x^A^^k'H+1'P^    •  ^^-^^ 

Han  uses  X,  _|_,  defined  by  (2.4),  or  equivalently  (2.7e),  and  proves  that 
{(xj^,X^)}  is  0-superiinearly  convergent  to  (x*,X^).  Tapia  shews  that 
if  X,^j  is  defined  by  (2.8),  Q-superlinear  convergence  of  (xi  }  is 
obtained.  Recently  Boggs,  Tolle  and  Wang  (1982)  and  Fontecilla, 
Stethaug  and  Tapia  (1983)  have  shown  that  '^-superllnear  convergence  of 
{xj^}  is  obtained  even  when  ^i^+j^  is  defined  hv  (2.4).  We  note  that 
although  the  methods  of  Han  and  Tapia  are  almost  the  same  for  (1.1), 
the  two  authors  give  quite  different  treatments  of  inequality 
constraints. 
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The  main  disadvantage  of  approximating  V-L.(x,X,p)  is  that 
choosing  p  can  prove  difficult,  since  the  algorithm  will  fail  if  p  is 
too  small,  and  the  approximation  matrix  will  he  ill-conditioned  if  p  is 
too  large.  Powell  (1978a)  suggested  a  different  method  which  does  not 
use  the  augmented  Lagrangian  function.  He  proposed  defining  s^ ,  v,  by 
(3.1)  with  p  =  0,  simply  modifying  the  BFOS  update  formula  to  ensure 
that  the  approximation  is  always  positive  definite.  Powell  proved  that 
if  the  sequence  generated  by  this  algorithm  converges,  the  rate  of 
convergence  is  R-superlinear  (see  Ortega  and  Rheinboldt  (1970)  for  the 
definition  of  this  term)  even  if  W(xj^,X^)  is  not  positive  definite. 
Powell  also  giave  conditions  tmder  which  such  an  algorithm  wouJ^d  be 
"two-step"  0-superlinearly  convergent.  We  wiJ.l  return  to  this  topic  in 
the  next    section. 

Let  us  now  consider  a  different  approach  altogether.  An  obvious 
idea  suggested  by  Goodman's  paper  is  to  apply  Broyden's  method  to  the 
nonlinear  system  (2.10),  which  has  two  components,  7.  o  and  c.  It  is 
easily  seen  that  '^royden's  method  updates  each  row  of  the  aporoximate 
Jacobian  independently  from  the  other  rows.  Thus  a  more  attractive 
idea  is  to  use  A(x)  explicitly,  updating  an  approximation  to  the 
derivative    of    z"g    onlv.      This    "partial   Rroyden"   method    is    defined  by: 
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P   =    - 


A(x^)^ 


c(x^) 


(3.2a) 


^k+1   =    ^k   +  P 


(3.2b) 


=^k   ~  ^k+1        ^k 


(3.2c) 


yi.    =  Z(x^.,)^g(x^,,)    -Z(x.)Vr(x,) 


^k+1'    >^^^k+l 


(3. 2d) 


^k+1 


\  + 


^^k-Vk^^k 

T 
^k   ^k 


(3.2e) 


Note  that  B,  is  a  t  x  n  matrix  and  represents  an  approximation  to  Z  W 
in  (2.11).  (Strictly  speaking,  the  Jacoblan  of  (2.10)  is,  in  general, 
(2.13),  but,  as  noted  earlier,  the  termT(Z  g)  converges  to  zero  as 
X.  ■*■  x^.  )  We  have  not  yet  specified  the  form  of  Z(x),  but  It  Is 
important  that  Z(x)  be  smooth  In  order  that  y,^  be  useful.  Let  us 
define  Z(x)  by  a  particular  implementation  of  the  QR  factorization 
(2.1).  There  is  no  guarantee  that  this  yields  a  smooth  Z(x)  In  the 
region  of  interest,  but  there  are  various  techniques  which  one  might 
use  to  enforce  this  if  necessary.  Coleman  and  "^orensen  (1982)  suggest 
some  possibilities,  and  we  discuss  the  matter  further  In  Section  5.  We 
will  therefore  simply  assume  that  Z(x)  Is  dif f erentiahle  In  the 
appropriate  region.  Let  us  now  state  the  algorithm  more  formally, 
showing   the  necessary  Initialization  and  making   use  of    (2.1).        We      use 


-T^- 


subscrtpt    k   to    Indicate    that    the    quantities    f ,s,c,A  are    to   he  evaluated 
at    x^. 


ALOORITHM  3.  I    (Local  version,    no  line    search): 


Choose  a    convergence    tolerance  TOL. 


Choose    ^Qj^Q.    where   B      is    a    t   x    n  matrix. 
Evaluate    f^.go.c^.AQ. 


Factor  A      =    [Y  7.] 


Set  k  -*-    0. 
^peat 


Solve   \  Py   =  ~  <^k 

Solve  (Vk^Pz  =  -  ^'k^^k  -  VkPy 


■:.ii. 


Evaluate    f^^^,,    g^+,^ ,    c^+,  ,    \^^ 


Factor  A^+^   =    [Y^+^        Z^^,  ] 


^k+1 
n 


If 


—        T         - 
^k+1   ^k+1 


<  TOL  then  STOP. 


Update:      Obtain  ^^+^    fron    (3.2c,d,e) 

Set   k  *■    k+1. 


There   is  no  guarantee  that  B,  Z.  will  he  positive  definite  or  even 


symmetric.  However,  it  can  he  svmnetrized,  and  the  modified  Choieskv 
factorization  of  Giii  and  Murray  (1974a)  can  he  used  to  check  positive 
definiteness    and    take   alternative   action  when   necessarv. 

The  following   local   convergence  theorem  can   he    stated    immediatelv. 

Theorem  3.1:  Let  .Assumption  1  hold,  and  let  f'^)^},  f^i^}»  etc.,  he 
given  hv  Algorithm  3.1  with  TOL  =  n.  Assume  that  Z(x)  given  hv  (2.1) 
is  continuouslv  dif ferentiahle  in  a  neighborhood  of  x^.  Then  there 
exist  e    >    0  and         6    >    0  such  that        if        II  x^-x^H    <    e        and 

II  B.  -  Z  (xj.)'W(x^,X  j(.)ll  <  5,  then  Algorithm  3.1  is  well  defined  and 
'\  -^    x^  at   a    one-step  n-superiinear    rate. 

Proof:  The  Jacohian  of  (2.10)  is  given  by  (2.13),  which  is  easilv 
seen  to  beLipschitz  continuous  at  x  =  x^.  Furthermore,  (2. 13)  reduces 
to  (2.11)  at  X  =  X*,  where  it  is  nonsinguiar  by  Assunption  1  (see  the 
remarks  following  (2.12)).  The  procf  follows  almost  immediately  from 
the  results  given  by  Dennis  and  More  (1977,  p.  58).  The  onlv 
modification  necessary  is  to  check  that  the  partial  Rrovden  update, 
using  the  exact  A(x)  instead  of  an  approximation,  retains  the 
superlinear   convergence   property.      This  is    easilv   verified. 

e 

One  interesting  aspect  of  the  oartiai  Brovden  method  is  that  X 
does  not  appear  anywhere  in  the  statement  of  the  algorithm.  The 
nultlpliers  are  involved  only  implicitly,  in  the  form  of  the  Jacobian 
(2.11).  However,  (3. 2d)  is  not  the  only  possible  definition  of  y^ 
giving    a   t   X    n   update  with  Q-superlinear   convergence.        In     particular, 
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it  Is  clear  from  the  results  In  the  next  section  that  reolacine;  (3. 2d) 
by  any  of  (4.1d,e  or  f)  will  give  a  Q-superlLnearlv  convergent 
algorithm.  The  proof  is  substantially  longer,  since  these  definitions 
of  y.  do  not  result  from  applying  a  Rrovden  or  partiaJ.  Broyden  method 
to   any  system  of    equations    such  as    (2,iO), 

Let  us  now  make  some  more  general  remarks.  We  define  a  method  to 
be  a  linearized  cons  traint  quasi-Newton  method  for  solving  (l.i)  if  it 
consists  of   the   iteration: 


A(x^)^ 


P   = 


"zCx^>'fg(x^)' 

c(x^) 


(3.3) 


"■k+l   ~    '^k 


Xu    +  P 


where  we  assume  only  that  R  ["^^x))  =  W(A(x)'}  with  Z(x)  continuously 
dif  f  erentiable  in  the  region  of  "lltiterest  and  where  1^%}  is  any  sequence 
of  t  X  n  matrices.  Xfe  have  from  the  discussion  in  !^ection  2  that  the 
Newton  iteration  (2.4),  or  enuivalently  (2 . 6) or  ( 2. 7)  ,  and  the 
Newton-type  iteration  (2.12),  are  linearized  constraint  quasi-Newton 
methods.  The  same  is  true  of  the  methods  of  Han,  Tapia  and  Powell, 
which  replace  W  in  (2.4)  bv  a  suitable  approximation,  and  the  partial 
Brovden  nethod  given  by  Algorithm  3.1.  Now  an  iteration  of  the  form 
(3.3),  which  is  intended  to  solve  (l.i),  may  equivaiently  be  '/iewed  as 
a  method  to  solve  the  nonlinear  system  (2.10).  We  may  therefore  apply 
the  Dennis  and  More  characterization  of  o-superlinear  convergence  (see 
Dennis    and  More    (1977,    p.    51)j    and    immediately    obtain,    using    (2.11): 


Theorem  3.2:  Let  .\ssunption  1  hold.  If  a  ilnearizeri  constraint 
auasl-Newton  nt;  thod  (3.3)  produces  a  sequence  {x^^}  conver<3;in£^  to  x^, 
the    rate    of    convergence    is    one-step   O-superiLnear    if    and    onlv   It 


Itm        ;; ■ ^       <^      .  (3.4) 


This  interesting  result  has  already  been  given  in  a  slightly  different 
form  by  ^g,gs,  Toile  and  Wang  (1982).  They  give  a  different  definition 
for  the  class  of  methods  they  treat,  using  an  n  ^  n  matrix  C^  to 
reolace  U  in  (2.4),  The  condition  they  give  for  i^-superlinear 
convergence    is  ,  .       r'  ''    .">"■■    ' 


llZ(x^)Z(x^)T(C^-W,)(x^^,-x^)ll    :■ 

k+oo  K+1  '     k   ) 


To  see  that  the  two  conditions  are  equivalent,  first  note  that  by 
(2.6),  C^  affects  the  definition  of  x^^.^  onlv  through  Z(x^)  C,^.  The 
definition  of  ^v^i  is  determined  by  the  other  component  of  C^,  but  this 
influences  onJ.y  Q,+i  which  is  unspecified  anyway.  Also  note  that  the 
first  factor  Z(x.  )  may  be  removed  from  (3.5).  Thus  given  C^  satisfying 
(3.5),  we  may  obtain  (3.4)  by  setting  \  =  Z(x^)  C^,  and  given  R^ 
satisfying  (3.4)  we  may  obtain  (3.5)  by  setting  C^  =  Z(x|^)R,^.  It  is 
rernarkabie  that  Goodmin's  formula  for  the  -Tacobian  (2.11)  makes  this 
simple    proof    of    the    characterization    theorem   possible. 

We   note    that    Boggs,    Tolle   and  Wang      showed      that      the     methods      of 
Tapia      and  Han   satisfv    (3.5),    thus    showing    that    they    generate    sequences 
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(x,  }   which   are    (^-superiLnearly   convergent.         Powell's      r^odificat ton      of 

Han's     method      does    not   satisfy    (3.5),    as   we  will   mention    in   Section    4. 

Verifying    that    the    partial   Rroyden  method    satisfies    (3.4)    is    one    of    the 

steps    in    the    proof   of   Theorem   3.1. 

It  is  interesting  to  consider  again  the  block  matrix  in  (2.6). 
^vfhen  suitable  linear  transformations  are  applied,  one  can  view  the 
Garcia  and  Margasarian  method  as  approximating  the  entire  matrix,  the 
methods    of  Tapia  and  Han  as    approximating    the    four  blocks  in   the     upper 

left      comer,      and      the   partial  Broyden  method   as  approximating   the    two 

T  T 

blocks    in   the   middle  row,   Z'-WY  and   ZMiJZ.      Together   these     make      up     the 

matrix      Z    W,     which      mav      be   viewed'  as   a    "one-sided   proiected   Hessian". 

Theorem   3.2   shows  that   approximating     this      matrix      in     some      sense      is 

essential     for  one-step  O-superiinear   convergence.      An  attractive   idea, 

however,     is    to  try   to  approximate -only  Z   WZ,   the      "two-sided     projected 

Hessian",      since      this      is      the  '  matrix     which      is   positive  definite    by 

Assumption   i.    This    is    the    topic   of    the   next    section. 
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4.    UPOATTNG    AN   APPRnXIMATTOM   TO   zJwZ 

t^^^en  the    constraints    {c.(x)}    are    linear,    the    space        ^   [A(x)    j       Ls 

constant,       and      assunine;     7.      is    available    from    (1.1),    there    is    no   doubt 

T 
that   updating   an  approximation    to   Z^IJZ   is      preferable      to      updating      an 

approximation      to      the    full  Hessian   'J.    The   advantages    are    that    a   matrix 

T 
of    smaller   dimension    is    recurred    and    that    Z    WZ   is   positive   definite       in 

a   neighborhood    of    Xj.   by   Assumption    1.    This   approach   has    been   emphasised 

by   Gill  and   Murray    (i^TAc);    see  also  Goldfarb    (i<^76).      In    the      linearly 

T 
constrained      case,    the    term  Z    W   is    irrelevant    because , given  an    initial 

feasible    point,    all  subsequent    iterates    can   be    chosen   to      be      feasible, 

and      hence      c      and      py      are     zero      in      (2.7).        The   idea    of    updating   an 

T  T 

approximation    to    7.   \^7.  in   the   nonlinear   case » .  disca-rding    the      Z   '•!¥      term 

in      (2.7),      was     suggested      bv     Murray  and   Wright    (1973),    again    for    the 

purpose    of    reducing    the    dimension    .of      the      approximating      matrix      and, 

especially,      to     allow     maintaining     a      positive   definite   approximation 

without    the    introduction   of    the    parameter  p    in    (T.l).        This      idea      has 

also      been      discussed      by  Tanabe    (1981),    FJ  etcher    and  'Jomersiev    (1982), 

Gabay   (1982)    and  Coleman   and   Conn    (1982).      In    this    section  we   present    a 

new    algorithm  and    analyze    its    convergence    properties. 

An  updating  method    for  nonlinear   constraints   which    is    based    on    the 

T 

iteration  (2.7)   but    ignores   the   Z'-'.-JY   term  can     hope      for,      at     best,      a 
two-step  0-superllnear   convergence   rate,   i.e., 

II  x^+-^    -  x*ll    <■    ct^llx^_i    -  x*ll     ,  with  a^  -»■    0   as   k  -»■  »    . 

This      is      explained    by   Powell  (1978a)     in   the   context    of   modifving  Han's 
algorithm.      In   Powell's   method   an   approximation    to    the   full     Hessian     W 
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is  maintained,  but  only  the  Z  WZ  portion  of  W  has  any  relation  to  the 
approximating  matrix,  since  the  latter  is  positive  definite  while  W  is 
not.  The  reason  that  it  is  appropriate  to  consider  a  two-step 
convergence  rate  is  essentially  as  follows.  Let  x,  _■>  be  the  current 
iterate.  If  z"l>rY  is  omitted  from  (2.7b'),  or,  as  in  Powell's  algorithm, 
replaced  by  a  term  which  does  not  in  any  sense  approximate  it,  solving 
(2.7)  gives  a  point  x.  for  which  the  error  II  x,  -x^ll  cannot  be  expected 
to  be  smaller  than  II  x,  _, -x^ll  .  However,  the  solution  py  of  (2.7a) 
should  give  an  accurate  step  to  the  constraints,  vielding  a  value  H  c,  II 
which    is    relatively    small   compared     to      II  x,  _-,-x^ll  ,      assuming      that      the 

t 

scaling  is  reasonable  and  that  ^v.i  i^  close  to  x^.  On  the  next 
iteration,  the  error  incmrred  by -dropping- the  Z^JY  term  is  proportional 
to  II  c,  II  ,  so  the  next  point  generated,  ^i^-i-]^ »  has  an  error  II  Xu^j^ -Xj.il 
which  is  s-mall  relative  to  H  x.  _, -x^ll  .  this  argument  is  made  more 
precise   later.  ""  ;- .  -■ 

Let     us      therefore      suppose   that   an  updating  method   such  as   DFP    or 

T 
BFGS   is    to  be  used    to  maintain  an  approximation      to      Z   WZ.        The      first 

question   to  be  addressed   concerns   the    definitions   to  use    for   s,    and   y^^. 

The  obvious   candidates      which      come      to     mind      are      as      follows,      where 

subscript     k     denotes      evaluation     at      x,  ,      and   ^u+i    is    tlie  solution  of 

(2.8). 
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(a)  s^  =  Z^+^'^(x^+-^-x,,) 

or    (b)  .,,  =  Zj(x,,+px,p 

or    (c)  s,.  =  Z,^+^    x,^+^    -   Z^   x^,    and 

(d)  y^,  =  \+A^^+i  -  i^^-W+i^]  (^-1) 


'^  ^^^   ^k  =  \+i'^Uk+i  -  (%-Vk)) 


or    (f )      v^   =   Z^'^Cg;^^^    -  A^+^X^^.,    -  ff^) 


or    (g)      v,^   =  \+i^8k+l   -  ^^k^^k 


Note   that    (d)    is  equivalent    to 


•/k   =  \+i'^l^x^K4-l.H+P    -^x^^^k'H+i>> 


and  (e),  (f)  are  obtained  likewise.  ■Definition  (r)  is  the  same  as 
(3. 2d).  Case  (c)  can  be  eliminated  from  consideration  immediateJv, 
since    if,    for   example,    one    of    the    constraints    c,(x)    =  Z^xj",    it      follows 

T 

that    Z^^X|^    =   0  for  any   x.  ,    and    therefore    s^   is    always    zero. 

Fletcher  and  Womersley  (1982)  suggest  using  either  (a)  or  (b) 
together  with  (g),  ard  Cabay  (1982)  suggests  a  similar  method.  Since 
there  is  no  guarantee  that  s,  y.  >0,  both  oapers  suggest  using  the 
°oweli  (1978a)  'lodif Ication  of  the  BFGS  update  to  guarantee  the  pronertv 
of  inherited  positive  def initeness.  ^^ollowing  Powell's  results,  they 
show  that  under  certain  conditions,  which  include  the  assumptions  that 
X.  -»■  x^  and  that  the  approximation  matrices  remain  bounded,  a  two-step 
O-superlinear  rate   of    convergence   is   achieved. 

Let  us   examine   definitions      (a)      and      (b)      more      closely.        It      is 


_2q- 
cieariy     possible      to      construct      examples    where    H  s,  II     is    verv    small    (or 
zero)    while  U  y,  II  ,    using   any    of      (d)      through      (<?"),       is      not.         It       this 
happens,      both      the   DFP    and  BFGS    updates    become   verv    large    in   magnitude 

(or    undefined).      This    difficulty    is    not   avoided      by     using      the      ^oweli 

T  T 

modification      of      the      RFGS    update,    since    we   may   have   «].    v^^    >>   s^^    \^k' 

where   R,     is   the   current  approximation   matrix.      It    is      for     this      reason 

that      Coleman      and      Conn  have   developed    an  algorithm  which   ensures    that 

Us,  II     is   not   small   compared   to  H  Yi-I  •      They   use  an   iteration  of    the      form 

T  T 

(2.7),    replacing  Z '"WZ  by   the  approximation  ratrix  and   Z  MJY  by    zero,   and 

they   define 

«k  =  hJr^k-^^.^'  Vk  =  \^  i^x^^^'H^  -^^K'^k^) 

where  x^^  is  a  special  intermediate  point  given  by  x,..  +  '^k'^Z*  Following 
the  Broyden-Dennis-More    .    convergence        theory        for        unconstrained 

optimization,  Coleman  and  Conn  show  that  their  method  is  locally 
convergent  and  that  the  rate  of  convergence  is  two-step  0-superllnear . 
The  disadvantage  of  this  method  is  that  two  sets  of  gradient 
evaluations  are  required  for  every  iteration  of  the  form  (2.7).  If  we 
define  a  set  of  gradient  evaluations  to  be  a  unit  of  work,  the 
convergence  rate   would  be  best   described   as   four-step  0-superlinear. 

We  note  as  an  aside  that  Gabay  (19S2)  suggests  a  second  algorithm 
which  requires  two  sets  of  gradient  evaluations  per  iteration  and  has  a 
one-step  O-superlinear  rate  of  convergence  if  x,  *  x*.  This  method  is 
not  of  the  form  (3.3).  Again,  if  gradient  evaluations  define  a  unit  of 
work,    the   convergence   rate    remains   two-step  Q-superlinear. 

Our     approach      requires      only     one   set    of    gradient   evaluations  per 
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iteraCion.  The  idea  Is  simple:  if  "  Si.ll  is  consiiered  to  he  too  small, 
we  do  not  update  the  approximating  matrix.  Although  one  might  think- 
that  such  a  rule  would  impede  superlinear  convergence,  in  fact  we  are 
able,  under  certain  conditions,  to  prove  local  convergence  with  a 
two-step  0-superlinear  rate  of  convergence,  making  use  of  tht 
Broyden-Dennls-More    theory.      The    condition   on  H  s^^ll     is    as    follows.      Let 


Vi.    =  ^^v+ZCxv+i-^i,)    ,  (4.2a) 


'k    ~    'k+1    ^''k+l    ''k 


or 


vv    =  yJ(x.^,-x,.)  ■  ■  ■  (4.2b) 


'k    -    ^k    ^-^k+l    -^k 


where  the  choice  of  definition  corresponds  to  whether  (a)  or  (b)  is 
used  to  define  s,  in  (4.1).  We  update  the  approximating  matrix  using  a 
standard  update  formula  such  as  DFP  or  BFGS  if  and  only  if  the 
following  condition   holds: 


(k+1)^^ 


where      n       and     v       are        positive        constants        to        be        chosen.  The 

Broyden-Dennis-More  theory  for  unconstrained  optimization  states  that 
for  certain  updating  methods,  there  exist  e  >  0  and  6  >  0  such  that  if 
the  initial  point  x  is  within  e ,  in  norm,  of  the  solution  x^,  and  the 
initial  Hessian  approximation  is  within  £  ,  in  norm,  of  the  Hessian  of 
the  objective  function  at  x^,  then  (i)  {x^^}  converges  to  x^^  and  (ii) 
the  rate   of   convergence    is    one-step  Q-superiinear.      Although   in     theory 
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e      and     5    could    be    very    smaii,     In   practice    It    Is    found    that    superiinear 
convergence    Is   achieved  without   requiring   extremely     accurate      starting 
values    tor  x      and   B    ,      This   is    important    because   even    if    one    has    a    good 

0  0^ 

estimate  x  ,  computing  a  B  which  closely  approximates  the  Hessian  may 
be  prohibitively  expensive.  In  the  convergence  analysis  of  our 
projected  Hessian  updating  algorithm,  we  show  that,  given  any  v  >  0  for 
(4.3),  there  exist  e  >  0,  6  >  0  and  n  >  0,  such  that  when  the  algorithm 
Is  applied  with  initial  data  restricted  by  e  and  5  and  with  i  used  to 
define  the  condition  (4.3),  we  obtain  (i)  (x,  }  converges  to  x^,  and 
(ii)  the  convergence  rate  is  two-step  0-superlinear.  As  with  e  and  5 , 
T\  could  be  small  in  theory,  but  in  practice  we  find  that  moderate 
values   yield   superiinear  convergence. 

We  now  define    the   algorithm,    which   is    based      on      (2.7),      replacing 

T  T 

Z    WZ  by  an  approximation  and  omitting    the   Z   l-ZY   term. 
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ALOORITHM   4. 1    (Local  version,    no  line    search): 

Choose  a    convergence    tolerance  TOL. 

Choose   X    ,B    ,    where   B      Is    a   t   x    t    positive    definite   matrix. 
o '    o '  o  '^ 

Choose  V    >    0,    n    >    0, 
Evaluate    f  o^So'-^o' '^• 
Factor  A^   =    [\        ^'o^      \\ 


Set  k  -*-    n. 

■Repeat 

T 
Solve    Rj^    (py^    ~   ~  '^k* 


Solve    \(Py)    "   ~  \  %• 
Set   x^+^   =    ^],   +  \Py  +   \^Z' 
Evaluate    \+-^,o^+^,z^+-^,\+i' 
Factor   A^^^   =    [Y^^^        Z^^^  ] 


^k+1 


If    II 


pk+l^^k+l 


bk+i 


<  TOL  then  STOP. 


Update:      Obtain   ^u+i    from    some   Update  Rule 
Set  k  -*-    k+1. 

Update  Rule  I :  Let  s^,v^  he  defined  by  (4.1a, 4. 2a)  or  (4. lb, 4. 2b), 
and  let  y,  be  defined  by  any  of  (4.Ld)  through  (4.1^),  where  ^i^+j^  ts 
given   by    (2.9). 


If_(4.3)   holds    then  define  B^^^    by    the   PSB,    DFP    or   B^ns    update    formula 
else  set   B^+^   =  B^. 
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The   PS^^   and    OFP   update    formulas    are    c!;lven   bv   respecttveiv      set:tin^ 
b.     to    su   and   y,     in: 


^^k-Vk V  +  \^^k-Vk>'^    ^^k-Vk^^^kW^      ,,  ,, 

\+l   -   \  +  ^ -  T.     ^-^ •         ^^'^ 


\   ^k  ^\   ^k^' 


The  B^ns   update   Is    defined  by 


"k+1  -  \  +  — X —  "■~'^r 

^k   ^k  ^k   "^k^k 


Tn  practice  the  HFP  or  BFGS  formula  would  be  used,  since  these  updates 
have  the  desirable  property  of  Inherited  positive  def initeness.  The 
BFGS  update  usually  performs  better  than  DFP  on  unconstrained 
optimization  problems,  for  reasons  that  are  not  well  understood. 
Following  the  example  of  several  authors,  we  will  give  proofs  for  the 
DFP  update,  and  remark  that  these  are  easily  modified  for  the  BFGS 
case.  We  mention  PSB  simply  because  it  is  straightforward  to  include 
the  PSB  case  in   the    proofs  for   the   DFP   update. 

We  note  that  the  Cholesky  factors  of  B,  may  be  recurred  (see  Gill 
and  Murray  (1974c))  so  that  only  O(t-)  operations  are  required  to  solve 
the  linear  system  involving  B,  .  This  is  not  as  important  for  nonlinear 
constraints  as  it  is  for  linearly  constrained  and  unconstrained 
problems,  since  0(n'"ra)  operations  are  required  at  every  iteration  to 
factor  A^   and    form  Y^,\    (see  Wright  and   Glassman    (1978)). 

The  convergence  analysis  follows  a  statement  of  some  assumptions, 
notation  and    remarks. 
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As sumptions:    Throughout   the    rest    of    this   section,    we     assume      that 
Assumption      1      holds,      and      that      Y(x)      and   Z(x),    given  bv  a    particular 

implementation   of    (2.1),    are    continuously   dif ferentiahle   in    the      region 

9  2 

of    interest.      We   recall   that   V'-f    and    {V    Cj_}    are   assumed    to   be  Lipschit:^ 

continuous   matrix    functions.      Let   TOL  =   0    in  Algorithm   4.1    so      that      an 

infinite  sequence   of   points    is    defined. 

T 
Notation:  Let        ^k   ^  '^k   ~  ^*>        ^^^        ^*  ^  ^*  ^''*^*'        ^""^        ^^^ 

0^  =  Z^"W*Y^,  where  *  denotes  evaluation  at  (xj(..,X*).  Let  X  (x)  be  the 
least  squares  estimate  at  x,  i.e.,  the  solution  of  (2.8)  with  x^_^j^ 
replaced  by  x,  and  let  W(x)  be  an  abbreviation  for  W(_x,a(x),.  We  also 
write  W,  for  W(x.  ),  X^^  for  X(x^).  T^caU  that  'I  -  >:  denotes  the 
Euclidean  vector   norm     and     corresponding      induced     matrix     norm.        The 

notation     ilXliw,      where     X,M  are  n  x    n  matrices   and  M    is    nonsingular,    is 

f  T  ^  1  /'' 

used  for   the  weighted  Frobenius   norm  (trace  (MXM)    (MXM)J    '". 

Remark:  Consider  the  function  2(x)  -A(x)X(x),  which  vanishes  at 
x=  x^.  The  Jacobian  of  this  function  is  W(x)  -  A(x)X(x),  where  A 
represents  the  derivative  of  X(x).  Since  X  (x)  may  be  written  as 
^  A(x)''-A(x)j  ~^A(x)'''g(x) ,  it  follows  from  the  assumptions  that  A(x),  as 
well  as  U(x) ,  is  Lipschitz  continuous  in  a  neighborhood  of  x^.  Thus 
the   irean   value  theorm   (Ortega   and  Rheinboldt    (1970),    p.    70),   gives    both 


«8k+i  --ViH+i  -  (.^k--Vk^  -  '^\  -  V(-k))  K^i-k^"  <  "-k+i-k"- 


and 
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where     C      is      the     Llpschitz      constant,    provided   H  e^  II  ,    H  e.  ^,  II    are   smaii 
enaa^h.       ''Je    thus    have    both 


\''(^k+l-Vi^k+P    -  \''«k   -  ^X^^k+i-^k^"    ^    Cll  x^^^-x^ll  -  (4.5> 


for  II  e^ll  ,    II  e,  ^- II    smaii   enough,    and 


"  ^k'^^k   ~  ^k^\^k"'  ^    ^"  ^k"  "  ^^'^^ 


for  II  «ii,ll     smaii      enou^H.        These'     ine:quaiities      could      aiternativeiy      be 
derived  using  Goodman's    formula   C'2-.  iO. 

Re  ma  rk :      Recall      that~    th^   Banach      perturbation   lemma    (0rte9;a   and 

Rheinboidt    (1970,    p.    4  5))   shows    that    if  X,X  are    nx    n   matrices,    with      X 

nonsinejjlar,      II  X~-'-|l    <    a      and     II X- XII    <   — ,      then     X     is      invertihle  with 

2a 

II  (X)"^ll    <     2a. 


Remark:    Note    that   bv   definition  of  Algorithm  4.1,    we   always  have 


V(^k+i-^k)  =  -  V^-k  (^-7) 


\^(^k+i-^k)  =  -  V'^^'^k  •  (^•'^^ 


The     first     lemma     shows      the     quadratic     contraction      of      certain 
quantities . 
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Lerana        4.1:       Assume      that      for      siven     k,      II  e,,ll       and     II  e,,4.j^ll       are 
sutflclentiy    snaii.       Then       3'"q>    a    constant    independent    of    the    value   of 
k,    such    that 

(O  IIY^e^^^ll    ^    C^lle^ll- 

(ii)        "Ve^ll    <    C^Clle^^ll-  +  lle^.^ll-) 


(iit)      ll\^(x^4.i-x^)ll    <    C^CIIe^ll-   +lie^_ill-) 


Proof:       A     Tavior   expansion   9p.ves,    using   c^   =   0,    for    some    constant 
f^.  .    ...     . 


<^k  -AuV'    ^    Clle^l|2 


W^^k-^k+l^-k+P"    <    ^"^k"' 


and   hence   b/    (4.7), 


"\-V^k+l"    '    Clle^ll 


We    now    have    (i)    bv    appiving    the     Banach      lemma      to      shew      R^  is 

bounded  (independent  of  the  value  of  k)  if  II  e^ll  is  small  enough,  since 
R^  is  nonsin^lar  by  assunption-  Note  that  (i)  also  holds  when  k  is 
replaced   by    k-i,    using   the   same    argument.      To    obtain   (ii),    note    that 
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\^^k'l    '    "\-l%«    +"(\-\-P^^k"     . 


and  using    (i ) ,    with  k   replaced   by   k-1,    we    get,    for   some    constant  C, 


\V    '    C(lle^.il|2   +llx^-x^_^ll    lle^ll) 


which   yields    (ii).      For   (iii),    observe    that 


\^^^k+r^k^"  ^  "\^-ek+i"  +"V^k"  • 


We  now  give  a  theorem  which  shows  the  important  two-step 
contraction  in  the  error. 

Theorem  4. 1 :  Suppose  that  Algorithm  4.1  is  applied  with  any  update 
rule.  Let  ic  be  a  given  constant.  Then  3  ^  i  >  ^  s.t.  ,  for  any 
iteration   k,    if   'I  ^^Ij^H    <    <    and  H  e^_^il    <    ^i   then 

(i)      lle^ll    <    C^lle^_^ll 


and   furthermore,    if   "  Bj^  H    ^    <,    then 


(ii)      lle^+^ll    <    C^(lle^_^ll-  +  II  (\-H*)zJe,^ll  ) 


and 


(iii)      lle^+ill    <    C^(lle^^_^l|-    +  II  (\-H*)zJ(x^+pXj^)ll)    . 
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Here   C,    is   a   constant,    dependent    on  <    and  e.    Hut    not    on   k.      Assune      tor 


convenience    that   C.    >     1, 


Proof:    First   note 


^k"    <    "^k-^k-l"    +"^k-l" 

^    «\-l\-l^k-l    +  ^k-l\-l^^k-l"k-i"    -^ll^k-i" 


— T 
We      new    have    (i)    by    noting    that   H  \_ill     i-S    bounded    (independent    of    k)    by 

T 
the  Banach    iemnia,    and    that    c   and  Z    g    are    dif f erentiable    functions      x>rLth 

T 

c^    =    0     and     Z^   g^    =    0.         For      (it),      we      first      combine    (i)   above   with 
Lemma    1    (i)    to  give 


V^k+l"    <    '^"e,,-!""  (^-9) 


where  C    is    a  constant.      From    (4.6)   and  the   orthogonality     of      [Y,         Z^] 
we    have 


\^Sk   -  \W(\\^   +  ^V^^k"    <    Clle^l|2 


and      hence,       using      (i ) ,      Lemma    1    (11),    and    the  Lipschitz    continuity    of 
W,Z,Y 


^k^^k   -"*^^kV    ^    Clle^.^r-    ,  (4.10) 


where  C    is    a  constant.      Thus 


nzjg^  -   (H*-B^)zje^  -  \Z^\e^^^   -  (xj^+^-x^))  II    <    5ll  e^.^II 


_3q_ 

Bv    (4.8)   and  the    bound   "'^jT^H    <    <    this    selves 


^^-^k+l"    ^    <lClle^_il|2   +  ,|(H^-H^)Z,Je^llJ 


We    therefore   have    (ii)    usin^    (4.'5)    and 

For   (lii),    we   have    from    (4.10)    thajt 

ll^'^Sk  -"*\''^k+l   ^   ("*-V\''^^k+l-^k)    ^  \\''(^k+i-^k)"    ^    ^«^k-i"" 
and    applying    (^.B)   again,     this    gives 

IIH*Z^^V+l"    ^    Clle,^_ill2   +  II  (R^-H,)zJ(x^^+^-Xj^)ll     . 

The    result    now    follows    from    (4.11),    (4.9)   and    the    inverttbility    of  H^. 

o 

Corollary    4.1:    If    x.    +    x*,    ll\^ll    <    <    for   all  k,    and 


_    «  (\-H*)Z^'^(x^^^-x^)!l 
w,,  =  +    n  (4.12) 

^         « -k+l-k'' 


then  x^^   -»■    x^  at    a    two-step  0-superlinear   rate. 


Proof;    From    (ill)   of   Theorem  4.1  we   have,    if  k    is    large   enough, 
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2 


lle^,,ll    «    C,  (lie.,  II-   +ai,llx,,,,-x,JI) 


and    thus    bv    (i)    of  Theorem  1, 


e,^+lll    <    C.[lle^_ill-  +C^(i    +cpa.^lle^_illj 


Thus 


iie^+,11        -     ■■    .    ,-     ■■: 

lin,_JllL_>    0  (4.13) 

1,^    "^k-l" 


which    is    the   desired    result. 


Coroiiary  4.i  is  a  variation  on  Powell's  characterization  of 
t\TO-step  superlinear  convergence  (Powell  (1978a)] ,  in  the  same  way  that 
(3.4)  is  a  variation  on  the  ^oggs,  Tolle  and  Wan?  characterization 
(3.5). 

Lemira  4.2:  Suppose  Update  Rule  1  is  used  in  Algorithm  4.1.  Tf  the 
update   formula  is   used   at   iteration  k,    i.e.,    (4.3)   holds,    then 

"^k+l"^k"    ^    ^^    +  n  )ll  s^!l     .  (4.14) 


Proof:    Suppose    definitions   (4.1a,    4.2a)   are   used   for   aj^.v^^.       then 


"^k+r'^k"  ^  "Vk^^^k+r^k^"  +  "V-k'^^'^k+r^k)"  ="V  +"=^k" 
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and   the    resuJ.t    follows  from    (4.3).      The   proof   is    equalJ.y      triviaJ.      when 
definitions    (4.2a,    4.2b)  are  used. 

The  next    theorem  gives   a    bound    relating    s^. ,  v-     and   Hj.. 

Theorem  4. 2 :  Let  s^,v^  be  defined  bv  (4.1a,  4.2a)  or  (4. lb,  4.2b), 
and  let  y.  be  defined  by  any  of  (4. Id)  through  (4.1g).  Then  3  e  ^  ^  '^ 
such    that    if  lie.il    <    £  os    "  ^k+l"    ^    ^2'    ^^^^ 

\\y^^   -  H^sJ    <    C-,[  ii  x^^.px^li  ^   +  max    (11  e,^li  ,    "  e^+^ll  )    H  ^i^+i-Xi^H  J    +  "  ^'*v,^ll       (4.15) 
where  C2    is    a   constant    independent   of   k. 

Proof :    From    (4,5)  we   have,    for  £9   small   enough. 


^k^(Sk4-l    -Ak+lH+P    -^k\    -\\    ^Vk^   ^   Vk''^    ^'^k+l-^k^"    ^    ^"^k+l-^k"' 


and   hence 


\'^^%+l    -  Ak+lH+1^    -  Zk^Rk   -  "*7^k'^(^k+l-^k^" 

<    C(lix^+l-x,^li2  +lle^li    «x^+i-x^ll)   +IIG*yJ(x,^+^-x^) 


where  C  is  a  constant.  This  completes  the  proof  when  Si^,Vi^  are  given 
by  (4.1b,  4.2b)  and  y^  is  given  by  (4. If).  To  obtain  (4.15)  when  y^  is 
given   by    (4.1g),    note    that 


(\+i-\)'^(^^+i  -Ak+iH+i)"  ^  ^"^k+r'^k"  "^k+i" 


-42- 
for   a  constant    C.      The    result    (4.15)    mav    he  estahiished    for      t'nt      other 
cases       in   a    similar   rnanner.      fJote   that    it    does    not   hold    for    s,     ^iven   hv 
(4.1c). 


Corollary    4. 2:    Suppose  T'pdate  Rule   1    is      used      in     Algorithm     4.1, 

with     either  the   PSB   or  DFP   update    formula,    i.e.      (4.4)   with  h,     =  s^^.    or 

-1  f^ 
b.     =  Vn    respect  ivelv.      define  M    =  I    in  the    former   case   and   M    =  Hj.    '  ~    in 

the    latter    case.      T^t 


!I  V,     -  H*s,  II 


Then      if      max    (U  e^^ll  ,    H  e. +i"  )  ^    ^  ■>      of     Theorem      4.2,      and    if    the   update 
forniila   is   used  at    iteration  k,    i.e.      (4.3)  holds,    we   have 


— ^ — !:_<  II  Mil  ^^ 

II    V(~l.,      II 


'k" 


and 


llv,  II 
Y,,  <     {2C9(1    +Ti)   max    (II  e^ll  ,    I'e.   .lO    +11^^11    -.-__}    .       (4.17) 


s 


k" 


Proof:  The  first  inequality  is  trivial  in  the  PSB  case  and  is 
easily  shown  in  the  HFP  case.  The  second  inequality  follows  from  Lemma 
4.2  and  Theorem  4.2. 
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Ue    next    s'tate   a   weii  known   lemma. 


LemnB    4,3:    Let  M  be  any  nonsingular  Tiatrlx   such    that 


^k  -  '^"^^k"     .     1 


.-1, 


<    -  ,    with    s^  =^    0 


(4.18) 


Assime    the  update    formula    (4.4)    is    used   at  iteration  k.    Then     hj^' si.    >   0 
and 


hm  ^ 


,-1 


,„  a  7II  y,  -rl^s,  .1 

— -J    IIB,  -H.IU,  +-■-..     JL-     (4.19) 


where  a  ^  (0,1],  a,, 02  are  constants,  independent  of  k,  and 


1  >  9.  = 


1IM(B^-H*)s^ll 


\^  H* 


\   ="* 


(4.20) 


Proof :  See  Hroyden,  tennis  and  More  (1973),  p.  2  33  combined  with 
p.    2  42. 

We  now  give  a  local  linear  convergence  theorem,  following  the 
Rroyden-Dennis-More    theory    for  unconstrained   optimization. 
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Theorem     4.3:       Suppose      "pdate     Rule      1      Is    user!  in  Aisorithm   4.1. 
Choose          a          value          v    ">    0             in             condition  (4.3).  Then 

\/re(0,n,3e>0,    5>0,ri>0,      such   that   if  He    II    _<   e    , 

II  B   -H^ll  ,^  <    5,    and  n     is    used    to   define    condition   (4.3),    then 

n  e^+^W    <    nie^.^ll     ,  k  ?    1    ,  (4.21) 

i.e.  ,   xj^  >    x^  at    least   at   a   two-step  0-linear    rate. 

Proof:    We    prove    the    result    for   the  PSB  and  OFP   cases,    leaving      the 

BFGS   case   as    an   exercise.      Let  a,a,,a2   be    the    constants    from  Lemma    ^.3, 

_  o 

and   define  y^,    M  as    in  Corollary   4.2.      Let  a^    =  II  Mil  ^a^.      Let  3    be      such 

that 


XII    <    B 11X11^ 


for   any    n  X    n   mtrix  X.    Let  ic    =    2liH;'^ll  ,   5    =    1/ (211  H^^^!!) ,    so    that,    bv    the 
Banach    lemma, 


X-H,^ll    <    5    =>  II  X   ^11    <    K    .  (4.22) 


Let  E,,C,  be  as  in  Theorem  4.1,  using  this  definition  of  <,  and  let 
£2*^2  ^^  ^^  In  Theorem  4.2.  Now  we  require  €  >  0,  5  >  0,  n  >  0  to  be 
chosen  small  enough    that 
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e    <    e  ^/C^ 


e    <    £9/0^^ 


6   <   5/(23) 


C^(e    +  2B6  )  <    r 


Mil  -   {209(1   +  n)C,   e   +  niiG^ii  }  <  1 


'1 


••  ■     4C9a  +  n)C,e  , 

(2a, 6    +09)   ( : ^  +  n(i   +-]    IIG^Il)    <    5    . 


(4.23) 


Such      a     choice      of     e  ,6  ,n      is      clearly      possible.        Now     let  11  e^!l    <    e, 
II  B  -H^ll  X,  <    5.      We   will   establish  by    induction   that 


(i)      IIB-H*II^  <    26    ,  .1   >    0   . 

(ii)      II  eyi    <    C^lle^.^ll    ,  1^1. 

(iii)      lle^^ll    <    rllej_-Jl     ,  j   >    1    •        .  (4.24) 

(i  V )      If_  the  update  formula   is    used   at    iteration    j , 

then  IIB^^^-H^IIm  <     [(1    -  aO  ^)^^-   +  a  ^y  jl"  B  .-H^ll  ^  +  a9Y  ^ 

else  B  :j+i    =  B  .    ,  i   >    0    . 


Note    that   (iii)    is    the   desired    result. 

For  j  =  0,  (i)  is  true  by  assumption  and  (ii),  (iii)  are  vacuous. 
For  (iv),  we  need  only  verify  the  inequality  assuming  the  update 
formila  is  used  to  obtain  B^.  Because  II  e^H  <  E  <  t -^  and  H  B^  II  <  < , 
Theorem  4.1   applies   (with   k  =    1),    giving 
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e^ll    <;    CJl  e   II    <    Gt   bv    (4.23), 


.\Iso,    II  e^ll    <    £9  and   so  Corollary   4.2  applies   (with  k  =   0),    ^ivin^ 


<    imil~Yo<    [2C2(1   +n)Cj^e    +IIGjtllnjl 


irtb^-M    ^s.ll  9  „ 

c  o     ,    „.,„  1.      .    ,  ,^    ,,     ,    _  ^^  _     ,    „  ^  „_^„„™  2 


M-^s 


<    _  bv    choice   of  e .n    in  (4.23) 
3     ■ 


Thus  Lemna    4.3   applies    (with   k  =    0)  ,    giving    (iv)    for    ]  =    0. 

Now,  assuming  (i),...,(iv)  are  true  for  i=0, 1 .  2, .  . .  ,k-l ,  we  want 
to   shew    that    they   hold    for    j    =  k."  • 

(i).  From  (i )  and  (iv)  we  have,  :regardless  of  whether  the  update 
fonmia    is   used   at    iteration    j,  ,  ,- ' 

«B^+^-H^IU,  -  IIB.-H^IU,  <     (2a^6    +0^)7^    ,  i  =  n,  i ,  .  . .  ,k-l    . 

Now    we    have   II  e  II    <    e    and  II  eJI    <    C,e    <    £9,    which      together     with      (iii), 
j=0, 1 , . . . ,k-l,    gives 


e^li    <    rL^^--i    C^e    ,  i=0,l,...,k    .  (4.25) 


Thus     Corollary   4.2  applies    for    j=0, 1 , .  . .  ,k-i,    giving   bounds    on   {y  ■}    to 
obtain, 
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j  =  0,  1 , .  . .  ,k.-l    . 


We    therefore  have 


B^+^-HaII,^  -  II  B.-H^IU^  <     (2a^5    +02)    [2C-,(1    +  n  )C^r  L  J/2  J    ^    ^ n_ ,|  ^^^ 


(j+1)^^^ 


j=0,l,, ..  ,k-l 


where  the  condition  (4.3)  has  been  applied  if  the  iipdate  formula  is 
used  at  iteration  j.  ^fhen  the  update  formula  is  not  used  we  have 
B  ._!_,  =  R.  and  hence  the  Inequality  holds  trivially.  Sunming  from 
j=0,l, .  . .  ,k-l  we   obtain 


4C^(1   +  n)C,£  , 

V"*"  M  ^    "  «o-"*"  M  +   ^2a^5    +  a  3)    [— ^__-i_  +  nl.  1   +  -J    ^  G^"  ] 

<    25 


by    choice   of  e  ,6  ,ri    in  (4.23).      This   proves   (i)   for    ]  =  k. 

(ii),(iii).  We  have  "  e,^_^ll  <  C^e  <  e  ^  bv  (4.25)  and  (4.23),  and 
"\-l"^*"  *^  266  <  6  ,  II  B^-H^II  <  2B6  <  6  from  (i)  and  (4.23).  Thus, 
using    (4.22),   Theorem  4.1   applies,    giving 


e^\    <    C^lle^.^ll 


and 
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^k+l"    ^    ^l<^"'^k-l"  ~   '^   23511  ejl  ) 


Applying    (4.25),    this    yields 

"^k+l"    ^    ^1<^^    +   ^^  )"^k-l" 

<    rlle^_^li    by    (4.2  3)    . 

(iv).  This  last  part  of  the  proof  is  almost  the  same  as  the  proof 
of  (iv)  for  7=0.  As  before,  we  need  only  verify  the  inequality 
assuning  the  update  formala  is  used.  We  have  H  e,  U  <  C-e  <  e  ^  bv  (4.25) 
and  (4.23),  and  "  e^^^^ll  <  H  e^_^li  <  C^e  <  t^  by  (iii),  (4.25)  and  (4.23). 
Thus  Corollary   4.2  applies,    giving 


^  '^     <    1IM1I^',^<    [l^Cjd    +n)C^e    +  IIG^lln]llMli  ^ 


M-^s^ll 


<    1  by    (4,23) 


Therefore  Lemma    4.3   applies,    giving    (iv).      This    completes    the   proof      of 
Theorem     4.3.        Note      that    it   covers    the   case  where  x.     =  x^^  for   some   k, 
which    implies    that  H  s,  II     =  H  v,  II     =    0,    the   update    formula    is    not   used      for 
1  >    k ,    and   e^  =   e,^^j^    =   ...    =    0. 


The      condition      (4.3)      defines      a      quantity     i)^  =  n/(k-t-l)      ^    which 
provides    an  upper   bound   on  II  v,  II  /ll  s^H     if    the   update    formula    is      used      at 
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iteration  k  and  a  lower  bound  on  II  v,  II  /ll  s,  II  otherwise.  It  was  possible 
to  prove  the  "bounded  deterioration"  in  4.24(i)  because  (fv  converges  to 
zero  sufficiently  rapidly  as  k  *  "  .  In  the  final  theorem  we  make  use 
of  the  fact  that  ^i,  does  not  decrease  too  rapidly  to  obtain  the  proof 
of    two-step   0-superlinear   convergence. 


Theorem  4.4:  Suppose  Update  Rule  1  is  used  in  .\lgorithm  1.  Let 
e  ,5  ,n  be  chosen  as  in  Theorem  4.3,  using  anv  value  r  ^  (0,1).  Then  in 
addition  to  the  result  (4.21),  we  also  have  that  the  rate  of 
convergence  is    two-step  0-superltnear. 

Proof :  We  may  assume  that  x^  t  x^  for  all  k,  since  otherwise  the 
result  holds   trivialiv.     We   therefore   wish   to   show  that 


Consider    the   following  partition  of    the    set    of    nonnegative   integers: 

Sj^   =    {k|the  update    formula  is   used   at  iteration  k,    i.e.,    (4.3)   holds} 
S2   =    {k|k  >    0,    k  ^  S^}    . 

We  will  consider  the  two  subsequences  (Cj^jk  e  S.},  i  =  l,2,  separately, 
arid  show  that,  for  j=l,2,  either  S.  is  finite  or  {Cv|l<-  ^  S.}  converges 
to  zero.  We  first  consider  f^i,i'<  ^^i)*  assuming  that  S-  is  not 
finite.  The  proof  that  this  subsequence  converges  to  zero  follows 
Broyden,    Dennis    and   More    (1973)    very    closely.      Let 
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S^   =    {k.jl  =  l,2,3,...} 


Pi  =  "\.-^*"m 


0^  =  j^^l,  [see   (4.2n)j 


^i   =  Y,^.  (see   (4.16)) 


T  1    /■> 

Note    that  (1    -ct6^    )'■'-<     1   -  a  ^.      Now   from    (4.24)  (iv),    we   have 


\.+r"*"M*     [(1    -ote2_)i/2   +ci^Y,,.]llB^_-H^il    +ajy^^_    , 


where,    proceeding  as   before   to  apply   Corollary   4.2,    and  usin<^    (4.3), 


Yj^.  <    2C2(1    +  n)C^r        ^       -^    e    + 


nil  G 


*" 


(k.+l)i+^ 


Note    that 


i 


Now      because      the      update    formula    Is   used   at    iteration  k    iff   k        S,,    we 
have 


\+i   =   \  ^^^  ^   ^^i 


I.e.  , 


^H+1   ^   '(^i+1) 
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Thus  we   have 


P  4^+1  <    (1    -  Oj^  +  a^Y^)p^  +  ot^Y^      ,    in.th  I  y  ^   <  <"    .  (4.26) 

i 

Ue    now    use  a   iemna      of     Han     (1976,      p.      274),      which      states      that      tf 
{a.},{b.}   are   sequences   of    nonnegatlve   numbers  with 


a^+l  <    (1    +Xib.)a.    +  X2bi    ,      I    h^ 


"1    ■ 
1 


where  X ^ ,X ^  are  positive  constants,  then  {a-}  converges  to  a  limit. 
Applying  this  to  (4.26),  we  have  that  {p  ^}  converges  to  a  limit. 
Rearranging    (4.26),    we  have      -c    -    ■,.     ^- 

^i'^i  ^   Pi  ~  Pi+1   "*"   (c^Pi  +  ao)^^^    . 

Taking  limits  we  have  P  i^  ^  -i-  0.  Hence  either  ^  ^^  OorOj^^  Has 
i  ->•  "  ,    and   in  either   case  we   have 


(B^^-H,)s^.ll 
; >    0   as    i  +  "    . 


S,       II 

'^i 


Thus,    using   either   (4.  la)    or    (4.1b)    for    the    definition   of    Sj^,    and     with 
U|^   defined   by    (4.12),    we    obtain 


(jj^     •»■    0   as    i  +  »    . 


Now    from  Theorem  4.1(iii),    for   i   large   enough, 
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X  1  1  1  u 


and   hence   bv    (  4.  2  4)  (i  1)  ,  (ill')  , 


Thus 


^k     =  li r"^    n  as    i  >  »    , 

i        "^k.-l"       , 


I.e.  ,    {Ci<.1t<-  e   S^}    converges    to  zero.        ,  -.,•  ^,      ,  r 

Finaiiv,  let  us  consider  {Ci^jk  G  S^},  assuming  S^  is  not  finite. 
The  reason  that  this  subsequence  converges  to  zero  is  essentiaiiv  that 
when  (4.3)  does  not  hold,  the  'I  v,  II  component  of  "  ^v+i~^i:"  i-^  large 
enough  that  the  quadratic  contraction  orthogonal  to  the  constraints  is 
sufficient  for  overall  superlinear  behavior.  From  (^,f>)  we  have,  for  k 
large    enough, 


"\^^k  -  \W^\\^  +  Vk^^^k"  <  ^"^" 


and    therefore,    using    (4.8)   and    the  Lipschit?:    continuity    of   W,Y,Z, 


-Vk^(^k+i-^k^  -"*\''^k  -  ^'*\"^k"  '   '^"^k"' 


for   a   constant  C.      Thus 


zje^ll    <    IIH;^1I     (IIB^II    II  Z J    (x^^+^-x^)!!    +11^*11    H  \^e^ll    +rile^ll^) 
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Now    suppose    Si-.v,     are   defined   by    (4.1b,    4.2b).      Note    r.hat  by    (4.24)  (i), 
II  \ll     is    bounded,     sav   by   3.      'fe    then   ?et,    using    the    fact    that    (4.3)    does 
not   hold    at   iteration   k, 


Z^'^e^ll    <    IIHl^ll    [B   i^lli2 ll\'^(^k+i-^k^"     "^''^'*"    "^^'^k"    +^ll^ic"^J 


Applying  Lenma    4.1    (ii),    (iii)    gives,    for    large    enough    kS    So, 


II  Z^'f'e^ll    <    Ii  h;^II    [  {3  ^-^ +  11^*11  }   C^(ll  e^ll  "  +  II  e^.^ll  ^)   +  CI!  e^ll  ^) 


Using  Theorem  4.1(1),    and   applying  Lemnia   4.1(ii)   again,    we   obtain,      for 
a    cons  tant  C  ,  ■  .•  •  •  -  ■  • 


^k+l"      ^    ^l"^k"      '^    ^l<^"\^ek"    ^"\'^^k"^     ^    C(k+1)^'^    "ek-l"~ 


Thus,    bv    (4.25),    setting  C    =  Cj^Ce,    we    have 


C,     =  _!ll^<    C(k+l)^^rL''/2  J*Oask^»,kGS^. 
^       II  e^^.^ll 


This  completes  the  proof  when  St^.v^^  are  defined  by  (4.1b,  4.2b).   It  is 
easily  modified  for  the  case  (4.1a,  4.2a). 
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5.    MIMERICAL   RESULTS 

All  the  algorithms  descrrlhed  In  this  paper  have  heen  tested  on  a 
sraaii  selection  of  test  problems.  The  programs  were  written  in  '^ortran 
and  run  on  a  VAX-11/780  at  the  Courant  '^la  thema  t  ics  and  Computi.nf» 
Laboratory/.  double      nreclsion      arithmetic,       i.e.,      approximately      16 

decimal  digits  of  accviracy,  was  used  throughout.  \  line  search  or  some 
other  strategy  to  enforce  progress  towards  a  solution  from  a  poor 
initial  guess  is  an  essential  part  of  a  practical  implementation  of 
thesse  algorlthiTK.  However,  for  comparative  testing  purposes  we  decided 
to  dispense  with  this  aspect  and  concentrate  on  the  local  behavior  of 
the  methods,  choosing  the  initiaJ,  values  carefuliv  so  that  convergence 
takes    place.  -  ,-.     .  r  ■  _     i     _       ■'',''' 

Let  us    first    consider    the    choice   ef    H   -for    the      updatin?^      methods. 

T 
The      simulcst    choice    is    7      for    the    partial   ^royden  methods    of    Section    3 

and      tl^      identity     matrix      for      the        algorithms        which        undate        an 

T 
approximation      to  W  or  Z  WZ.      However,    these   choices    were   not    generally 

adequate    for  convergence    unless  x     was    very   close    to   Xj..        We  did      not 

wish      to      use      ve rv      accurate    initial   values,    so   we   defined   R  by   doing 

coarse      finite      differencing.        This      is      explained      further  in        the 

descriptions    of    the   methods   which   follow. 

^1e  thod  DNI  (Discrete  Newton).  This  is  a  discrete  Newton  variant  of 
iteration  (2.7),  using  the  least  squares  multiplier  estimate  (2.9) 
instead     of      (2.7e).        The     matrix     WZ      is  approximated  by  differencing 

V    L(x,Xi^)   using   fon:>7ard    differences    at    Xj^    along    the    columns    of    Z,  ,    '.^7ith 

—  7  T  T 

a      difference      interval      of      10      .      Approximations    to   Z   '-lY   and   Z  WZ  are 
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then  obtained   by   naCrix  muit ip Ilea r, ion,    and    the    latter      is      symmf  trized 
by    averaging    it   vi  th    Itb    transpose. 

%  thod  r)N2.  This  is  the  same  as  DNl  ,  except  that  it  uses  (2.7e)  for 
the  TTuitipiier  estimate.  One  extra  finite  difference  operation  per 
iteration  is  required  to  obtain  an  approximation  to  '■'A'py.  The  least 
squares  estimate  is  used  to  obtain  the  initial  approximation  to  'IZ  for 
both  Methods    DNl    and  nN2. 

>fethod  Al_[Fuil  Hessian  (augmented) J  .  This  aproximates  the  ^essian  of 
the  augmented  Lagrangian'  funct  ion ,  defining  s,^  and  y,  by  (3.1),  where 
^k+1  ^^  given  by  (2.9),  and  defining  ^]^+i  by  the  ^FOS  update  formula. 
We  set  p  =  10  for  all  problems  except  H?;i04,  for  which  p  =  100.  We 
obtained     R      bv   dif  f  erencing 'V   L .  (x,X    ,p  ) ,    using  fon^jard   differences   at 

O        ^  X    A  O 

x  along  the  columns  of  I,  with -a  "coarse  difference  interval  of  10  "", 
and    symmetrizing. 


'^thod  A2  [Full  Hessian  (Powell)J.  This  is  Powell's  modification  of 
Method  Al,  using  p  =  0  in  (3.1)  and  modifying  the  RFGS  formula  to 
ensure  positive  def initeness.  See  Powell  (1978a, b)  for  details.  It  is 
not  appropriate  to  define  B  by  a  finite  difference  approximation  to  W 
since    this    is    not    generally  positive   definite.      Instead  we   defi 


ne 


^o=[Y        Z] 


I        0 
0        B 
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where     B      is      the       Lnitiai      natrlx     used      by  Tlethods    ^.l    throuq;h    C^    Csee 
below). 

^fethods   B1-'B4    (One-Slded   Projected   Hesstaii).        These      use      the      partial 


Broyden  updates.  Method  R4  is  Aigorithn  3.1,  and  Methods  Bl ,  B2,  B3 
are  obtained  bv  respectively  replacing  (3. 2d)  in  Mgorithm  3.1  by 
(4.1d,e  and  f).  The  initial  matrix  R  is  set  to  the  transpose  of  a 
finite  difference  approximation  to  WZ  at  x  ,  obtained  as  described  for 
Method   DMl,    except   with   a   coarse   difference    interval  of    lO"-". 

Nfethods  C1-C8  (Two-Sided  Projected  Hessian).  These  are  all  special 
cases  of  Algorithm  4.1,  using  Update  Rule  1  with  the  BFGS  update 
formula.  ^fethods  Cl-r4  define  s-^  and  v^  by  (4.1a,  4.2a)  and  Methods 
C5-C8  use  (4.1b,  4.2b).  The  definition  of  y,^  is  as  follows:  Methods 
C1,C5  use  (4. 2d),  ^tethods  C2,C6  use  (4.2e),  Methods  C3,C7  use  (4.2f) 
and  Methods   C4,C8  use   (4.2g).      The   parameters     were      given      the     values 

n    =    1,      V    =    0.01.         The      initial     matrix     R^   was    obtained   by    taking    the 

T 
approximation    to   WZ   described    for  Methods    B1-B4,    pre-mult iplying   by   Z    , 

and    symmetrizing. 

Methods  D  1-1)2  (Coleman  and  Conn).  Method  Dl  is  the  algorithm  given  by 
Coleman  and  Conn  (1982).  Method  t)2  is  the  variant  tbev  buggest, 
replacing   their   equation   (3.6)   by    their   equation   (1.4). 

We  now  describe  the  test  problems.  We  use  the  notation 
x=  (Ci,52>'''»^  •* »  0"^i-'^ting  the  transpose  sign  for  convenience.  The 
starting  points  x  were  chosen  so  that  convergence  to  the  solution  was 
obtained. 
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ProMem  Pj_  t  Wright    (1976)}.      n   =    2,    m   =    1. 
mi  n  5  j^C  9 

subject    to   Ci(x)    =Ct'   "^7    ~    2   =   0. 
Solution:       x^=    (-0.816407,    -1.15470). 
Starting   point:      x     =  (-1,    -1). 

Problen  P2   [Himnelblau  (1972),    Wright   (1976)).      n  =    3,   fi  =    2. 
min    1000   -Cr   -    2S5    -  E,]   -  ^  f,  2    ~?1?3 
subject    to   c^(x)    =  5?  +  ^2   +  ?3   -   25   =   0 

CjCx)    =  8^1  +   14^2  +   7^3   -    56   =   0 
Solution:      x^  =    (3.51212,    0.216988,    3.55217). 
Starting   point:      x^    =   (3,"  0.2,    3). 


Problem  P3   [Powell    (1973b)j  .      n  =   '5,   m   =    3. 

mi  n    }  i^  2^  3^  4'''  5        1    /r  3    ,  -  .  3    ,     ,  ^  2 
mine  -_-(,^^+f,  2+i; 

subject    to  ci^(x)    =  kI'+  K2   +^^l  +Cj  +C^   -    m   =   0 

C2(x)    =52^3   -    ^4^5   =   '^• 

C3(x)    =  ^l  +  ^l  +   I  =   0. 
Solution:      x*  =    (-1.71714,    1.59571,    1.82725,    -0.763643,    -0.763643), 
Starting   point:      x^   =  (-1.8,    1.7,    1.9,    -0.8,    -0.8). 
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Prohiem  P4_  [Wright    (1976)j.      n  =    '5,    m   =    3. 
This    is    a   modified    version   of    a   problem   ^iveu   bv   ^lieie,    et   al.    (I'^l'i). 

min   (?,-!)-    +   (5  1-^2)"    ^   (?2-^'3^^    +   ^^^  3-<  0''   +   ^^4-^5)'' 
subject    to   c^(x)    =  5  ^   +  5o   +  ?3   -   2   -   3/2"  =   0 

C2(x)    =  52    -  ?i    +  ^4   +   2    -    2  /T  =    0 

C3(x)    =  C1C5   -   2   =   0. 
Solution:       k^=    (1.11663,    1.22044,    1.53779,    1.97277,    1.79110). 
Starting   point:      x^    =   (1.2,    1.3,    1.6,    1.9,    1.7"). 

Problems  HSIOO,  HSIU,  HS104.  These  are  problem  numbers  100,  111  and 
104  respectiveiv  in  Hcclc  and  ScHittkox>7bki  (19R1").  '^he  reader  should 
refer  there  for  dttails.  They  are  stated  there  as  inequaiitv 
constrained  problems,  but  we  reduce  these  to  the  form  (1.1)  by 
specifying   oa!.y  the  active  constraints.     'Je  used    the    followine;     choices 


tor   x^ ; 
o 


^    .: 


HS    100.      n    =    7,   m 


x^    =   (2,    1.9,    -0.5,    4.3,    -0.6,    1,    1), 


HS   HI.      n    =   1 0 ,   m   =   3 . 

Xq   =  (-3,    -1.9,    -0.2,    -6.5,    -0.7,    -7.3,    -4.0,    -4.0,    -3.0,    -2.3) 

HS   104.      n    =    S,    m   =   4. 

Xq    =   (6.5,    2.3,    n.7,    0.6,    5.9,    5.6,    1.1,    0.4). 

The     computed      solutions    for  HSIOO  and  HS104   agreed   with    the    seven 
figures    of   accuracy  shown  bv   Hock   and   Schittkowski .      However,    for      some 
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reabon,    the    solution   computed    for  HSlll    agreed    only      to      three      ttgurtb 
with      the      solution      given     bv      Hock,      and    ^chittkowski   and    our   solution 
appears    to   be    rnore    accurate.      We   obtained 

X*  =    (-3.20231,    -1.91237,    -n.244427,    -6.56118,    -0.723098, 
-7.27423,    -3.59724,    -4.02032,    -3.28838,    -2.33437)    . 

Some   further   comments    on  the    implementation   follow. 
Cholesky  Factorization.      We  used    the   modified  Cholesky   factorization  of 

Gill      and     Murray      (1974a)      to     solve      the      systems   of    the    form   (2.7b), 

T 
verifving    that    all    the    approximations    to   Z    WZ   were      positive      definite. 

This      technique     was      also     used      to      check  x.7hether    the    initial  Hessian 

approximation   H      used   by  Method   Al   was   positive    definite.         In     several 

cases    it   was    not,    and    so    the   method  was   not    run,    as    indicated   by    "+"    in 

Table    1.    This    was  not    because  p    was      too      small,       but      because      of       the 

truncation  error    introduced  by  the    finite    differencing   of  pAc. 

QR  Factorization.  We  used  the  Wright  and  Olassman  (1978)  package, 
incorporating  column  pivoting,  to  implement  the  OR  factorization  (2.1). 
This  produced  na  trices  {Z,  }  which  varied  smoothly  with  {x^^}  on  all 
problems  except  HS104.  For  this  problem,  however,  Z^  varied  abruptly 
from  one  iteration  to  the  next,  causing  some  of  the  proiected  Hessian 
updating  ntthods  to  fail.  The  same  thing  happened  when  column  pivoting 
was  not  used.  As  mentioned  earlier,  Coleman  and  Sorensen  (1982)  have 
suggested  some  approaches  for  eliminating  this  difficulty.  We  instead 
followed  a  suggestion  of  Waiter  Murray  (private  communication).  We 
compute   the  QR  factorization   (2.1)   for  x   =  x   .      However,   for     k   >   0     we 
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dc      not      directly   form    (2.1)    for  x    =    x,^.      Let    the   OR  factorization    from 
the    previous    iteration   be    ariven,    i.e., 


\-l   =    \-l 


\-l 


Now   let 


S,.    = 


k   =  %-l   \ 


If    A,     is    close    to   Ai^_i  ,    S.    will  he  nearly   triangular,    in    the    sense    that 

the      elements      below      its    diagonal  will  he   small.      Therefore    if    we   form 

•  •  '.     -      ^'      -    .  =>.    •■•:1    -I    , 
the  QR  factorization  of    ?!,    without   col'jmn  pivoting,    i.e.. 


\  =Q 


we    can  expect   0    to   be    close      to      I      when      A^^      is      close      to      '^v-i  •        ^^"^ 


therefore    have  a   OR  factorization  of    A,     since 


\   =   (^-1^) 


with  0^  =  (Qi,_iQ)  close  to  ^]f__i  when  A^  is  close  to  A,^_t^.  We  used  this 
technique  for  solving  HS104,  and  the  results  in  Table  1  show  that  it 
was    effect  ive. 

We      summarize    the    results    in  "^able    1.    AJ.though   not   all   the   methods 
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have    been    stated    as   fomaliv  as    AlgorithTTs    3.1    and    -'t.l,    we   used    uniform 
tni tlailza tion   and    ternlnatlon    criteria,    tor   example    terminating    if 


I'^k+l"     = 


k+1  ^k+1 


■^k+l 


<  TOL 


—  8 
and      otherf^7ise■    proceeding      to     form     ^b+i  •        ^"'e     set  '''OL  to   10      ,      The 

following   quantities  are    displayed    in  'T'able   1: 

NT     -    the  number  of    iterations,    i.e.,    the    final  value  of  k+l. 

NF      -   the  number   of    evaluations    of    (f ,c,?,A),    including    the   evaluation 
at    X      but  not   including    the    finite    difference    operations 
required    to  obtain   R    .      For  Methods    \l    through    OS   we    have 
NF   =  NT  +  1,   for  Methods   ni,    n2   we   have   NF   =   2   NI  +  1 ,   and    for 
DNl    we  have  NF   =   (n-m)(NI-n    +   (NI+1)    (including    the    finite 
difference   operations    required   at   iterations    1    through   NI-1). 
Method  nM2    requires   an  additional  NX   evaluations. 

N!J     -   the  number  of   times  the  Rroyden  or  RFCS   update    formula  is   used. 
For  Methods    Al    through    R4    and  ni,    n2 ,    we  have   NU   =  NT    -   1 .      For 
Methods    C1-C8,    NU   is    the   number   of    times    (A. 3)   holds    during 
iterations    1    through   NT-1.      Vor  Methods   ^Nl ,    0N2,    we  have  NU  =  0. 


the    final   value   of   II  r^^j^ll  ,    using   the   standard  "ortran   floating 
point    notation. 
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Let  us  now  comment  on  the  resuJ.ts.  ''^irst  of  aJJ.,  the  results  for 
DNl  and  ^N2  do  not  show  anv  sl?inif  leant  arivanta>^e  Co  either  fom  of  the 
nuitlplier  estlTTate.  This  is  an  option  onlv  for  Newton's  method  or  a 
method  such  as  Al ,A?  which  approximates  a  full  Hessian  matrix.  Methods 
'^l  through  r)2  do  not  have  the  aptlon  of  using  (2.7e^  since  thev  do  not 
approximate  ''^.       ■  .- 

The  results  for  Method.  Al  on  Problems  "4  and  HSlll  could  he 
improved  hv  reducing  p,  therebv  improving  the  conditioning  of  the 
matrices  i\K  '''e  -chose — p  ^  10  because  It  is  difficult  to  know  an 
oDtimai  value  of  p  _a  priori,  z '5ven  then  It  was  necessarv  to  Increase  p 
for  Problem  HS104,  .where  we  used  p  =  100.  The  larger  p  is  made,  the 
worse  the  results  bfecoine",  "because'  of  increased  iil-conditionlng.  Thus 
the  sensitivity  of  -Method  Al  bo -the  choice  of  p  is  a  definite  drawback 
of  the  method,  Powell' s.  Method  A2. seems,  from  these  results,  to  be 
preferable.  However  we  note^ithat  no  proof  of  local  convergence  has  yet 
been  given  for  Method  A2.         '    ' 

The  partial  Rroyden  methods  R1-B4  are  quite  successful,  but  they 
do  seem  inferior  to  Method  A2  on  the  larger  problems.  The  fact  that 
thev  are  one-step  0-superllnearly  convergent  does  not  seem  to  be  a 
significant  advantage,  compared  with  the  disadvantage  that  thev  do  not 
use  the  ^'PG9,  update.  '''e  should  mention,  however,  that  Method  A2 
theoretically  has  a  one-step  o-superllnear  rate  of  convergence  en 
Problems  HSl^n  and  HSlll,  since  ^J^  is  positive  definite  for  these 
problems  (and  only  these).  The  partial  '^royden  methods  have  the 
advantage  that  they  maintain  an  approximation  matrix  with  smaller 
dimension    than    that    used   by  'fethod    A2. 

The      two-sided     proiected      Hessian   methods   C1-C8  compare    favorably 


with  Method  A2.  Thev  have  the  advantage  that  the  di-nenslon  of  ( "^i, }  is 
further  reduced.  Methods  ni-r)2  clearlv  suffer  from  the  extra  i»radient 
evaluations  required.  ^gthods  Cl-C^  ail  have  ahout  the  same 
performance  as    each   other.      The  same    is    true   of   Methods    '^l-'^'i. 

The  parameter  values  n  =  1  and  v  =  O.ni  for  Methods  Ci-CS  were 
successful  for  these  test  prohlems..  There  vrouJ.d  prohahiy  never  he  a 
need  for  a  different  choice  of  v,  since  the  oniv  purpose  of  this 
parameter  is  to  allow  sumnins^  the  infinite  series  in  the  proof  of 
bounded  deterioration  (4,?4)(i').  However,  these  methods  are  certainlv 
s^ensitive  to  the  value  of  n.  If  n  is  too  small  the  BFnS  update  mav 
ne^'er  he  used  and  the  converc;ence  hecomes  extremeJ.v  slow.  ^or  example, 
this  happens  when  n  =  10~^  for  Problem  H'^111.  On  the  other  hand,  if  n 
is    too    large,     the  method   irav   fail,    as   the    following    example   shows. 


Problem   P5.      n=2,m   =    l, 


This    example   was   provided   by  Richard  Byrd    (private    communication), 
min   f(0    =l^\+l^\-   S5^C, 


subject    to   Ci  (x)    =  =-^  +1=0, 
Solution:      x^   =    (5,1), 


Starting   point:      x^   =  (5.n001,    l.nnooi). 


If  Method  Cl  (for  example)  is  applied  to  Problem  PS  with  ^^  taken  as 
the  true  Hessian  at  x  and  n  =  1,  v  =  10  -,  the  first  update  causes  '^^ 
to  become  aJ.most  exactlv  singular,  since  v^s  is  nearlv  zero.  "''he 
convergence  theorem  does  not  apply  with  this  value  for  n ,  since 
II  Hj(.ll     =    1,      II  Mil     =    1,      II  C^ii     =    5,      and      Sn      is      not    less    than  ^r-,    violating 


-6S- 
(4.23).      iftien  n    =    in~^,    the  BFHS   update    fomuia   Is    not      useH,      B- 
and    the   al^crtthin    teminates    with    II  r^il    <  '^OL. 


C 


Finally,      we      outline      an      example      where    II  s    II    <<  llv   1!     for  "Methods 
Cl-CS.       Consider   a   problem  with   n  =    2  and    m  =    1,    with    the    constraint 


cCx) 


As  in  Problem  ^S,  this  constraint,  thouf^h  nonlinear,  has  linear 
contours.  Now  consider  an  obiective  function  f  t-Jith  one  contour 
(x|f(x)  =  1}  siven  bv  the  circle  of  radius  one  centered  at  (1,2"),  and 
with  a  second  contour  {xjf(x)  =  2}  ^iven  bv  the  circle  of  radius  three 
centered  at   (n,3).      (See  Figure    1.) 


f(x)    =   2 


>c^(x)    =   - 


'->   h 


c^(x)    =   0 


Figure    1 
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It    Is   clearlv  possible   to   construct:    a      smooth      function      f      with      these 

oropertles,      aithoii^h      ohtalnln?      a   fomuia   for   f   would   he    complicated. 

T 
We    therefore  have   x^   =    (0,0").      Let    x      =   (l,i).         Bv      construction,      7    g 

vanishes      at      x        as    well   as    at    x*.      Regardless    of    the    value  of    3        the 

step    to    the    next    point    K^     Is  orthogonal      to      the      constraint      contours, 

giving      X,    =    (1,   -1),        Now      f    can   be    chosen   so    that    g(x|)   has   a    nonzero 

first    component,    and    tlius    H  v   II    >    0,    although    H  s   II    =    '^.      We    can      perturb 

X        sllghtlv    so    that   H  s   H     Is    a    verv    smalJ.   positive    number  while   H  v   II     Is 

of   moderate   size.       Thus    If    the  RFGS   update    formula      Is      used,       11^,11       Is 

J. 

very      large.         However,       If     Update  Rule    1    Is    used,    ^.rLth    anv   reasonable 
value    for  t\  ,    B,    =  ^^   and    the    dlfflcultv    Is    avoided. 

It  Is  clearlv  Important  for  practical  purposes  to  deveJ.op  a 
variant  of  \J,gorlthm  i.  1  which  automatically  controls  the  value  of  n  In 
some    suitable  wav.      We    leave    this  as   a  topic    for   future    research. 


aL(XN0^^JL^n(?1ENTS 
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